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What’s New and Different in 
SYSTAT 12 


This chapter gives a summary of new features and major changes in this version, 
relative to SYSTAT 11, inrespect of GUI, data, commands, output, help, graphics, and 
statistics. Under each of these items, a list is given of new features, modified features 
and deleted features. This is followed by a brief description of each item in the same 
order, with the same serial number. More details are given in the appropriate chapters 
in the manual. 


GENERAL FEATURES 
GUI 


New Features 
1. Startpage 
Variable Editor 
Opening Multiple Data Files (View and Active Modes) 
Creating a New Data File 
Opening Multiple Graphs (View and Active Modes) 
Examples Tab 
Opening Multiple Command Files 
Saving Command Files in the ANSI Format 


een AWRY ND 


Autocomplete Commands 


ter 1 


10. Command Coloring 

11. Active Tab at the Beginning of Viewspace/Commandspace 
12. Interlinked Command Line and Dialog Interface 

13. Closing all Open Files 

14. Quick Access Menu 

15. Window Menu - Multiple Views 

16. Interface Themes 

17. Case Selection Tools 

18. Crash Recovery and Rescue System 

19. Windows XP Style Interface 


Modified Features 
20. Appearance of Data Editor 
21. Data File Comments in Data Editor 
22. Up and Down Arrow Keys 
23. Default Menu Structure 
24. Menu Customization 
25. Status Bar 
26. Additional Shortcut Keys 
27. Context Menus 
28. Dialog Boxes 


Deleted Features 
29. FEdit 
30. Printing Command Files 
31. Saving from the Interactive Tab of the Commandspace 
32. Opening .CMD (Command) Files 
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Data 


New Features 
33. New Data File Format with Compression 
34. Date AND Time Format for a Date-Time Variable 
35. Processing Conditions in Effect Pane 
36. Copying Data as Text 
37. Pasting Data with Custom Row and Column Separators 
38. Pasting Variable Properties 
39. Variable Statistics 
40. Undo and Redo of Data Modifications 


r 
= 


. Recoding Variables 

42. Trimming Data 

43. Centering and Stacking Data Through the Dialog Interface 
44, Trimming Leading and Trailing Spaces in String Data 


Modified Features 
45. Field Width and Decimal Places for Numeric Data 
46. Number of Characters for String Data 
47. Variable Name 
48. Variable Labels 
49. Variable Properties Dialog Box 
50. Print Preview of Selected Data 
51. Importing/Exporting Data Files 
52. Pasting Data (Overwriting) through Menus 
53, Inserting Variables from the Clipboard Directly 
54. Data Editor Accessible During BASIC, MATRIX and RANDSAMP 
55. Merging Data Files 
56. Appending Data Files 
57. Value Labels 
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58. Ordering Categories 


Deleted Features 
59. Data Aggregation using Value Labels 
60. Saving in the SYSTAT .SYS Format 
61. Importing/Exporting BMDP Data Files 


Commands 


New Features 
62. !! (for Inserting Comments at the End of a Command Line) 
63. NAMES (GENLAB Option) 
64. RNDGEN (to Set the Random Number Generator) 
65. VDISPLAY (to Control Variable Label Display in the Output) 
66. LDISPLAY (to Control Variable Label Display in the Output) 
67. RECODE (for Recoding Variables) 
68. TRIM (for Trimming Variables) 
69. Temporary Variables 
70. Temporary Arrays 
71. IF .. THEN .. ENDIF 
72. BEGINBLOCK .. ENDBLOCK 
73. WHILE...ENDWHILE 
74, FUNCTION .. RETURN (to Create User Defined functions) 
75. DATA and DATA$ Functions (to Identify Individual Data Values) 
76. LEN Function (to Determine Length of a String) 
77. NCAT Function (to Determine Number of Categories) 
78. Multicase Functions 


= 


n 


79. Reserved Keywords 
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Modified Features 


97. 


98 
99 
10 
10 


Deleted 
10 
10 


. Command Syntax Rules 

. BASIC, MATRIX and STATS Modules Made Global 
. PRINT and PLENGTH Commands 

. FPATH (GALLERY and PROJECT Options) 

. DEFVAR 

. INSERT 

. PASTE (After Running CUT Command) 

. DELETE (for ROWS and COLUMNS) 

. SAVE and WORK 

. EXTRACT (VARIABLES Option) 

. CATEGORY 

. FOR...NEXT 

. Conditional FOR...NEXT 

_ USE, LET, SELECT, DROP and DELETE in Matrix 
. CLEAR (for Temporary Variables, Arrays and Matrices) 
_CYAN and MAGENTA Color Option Values 

. Common Graph Options 

STATS Commands 

. TESTING Commands 

. MODEL Command in PROBIT 

(0.QCREGRESS and RUNCHART in QC 
1.Alternative and Replacement Commands 


Features 
2. TABLES and GRAPHS Options for OUTPUT 


3.Deleted Commands 


104.Alternative Notations of Relational/Logical Operators 
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Output 


New Features 
105.HTML Based Output 
106.Organizing Output by Data Files 
107.Enhanced Print Preview 
108.Setting Very Small Font Sizes 
109.Wrapping and/or Truncating Text in Tables 
110.Inserting Images, Block Format, Indent and Outdent 
111.Customizing the Output Scheme 
112.Setting Detailed Output Organizer Node Captions 


Modified Features 
113.Page Width 
114.Saved Output 
115.Saving Text Output to File 


Deleted Features 
116.Copying and Pasting Output Through Output Organizer 
117.Saving as Framed HTML Page 
118. Viewing Graphs as Frames Only 


Help 


New Features 
119.Help through the Commandspace 
120.Bubble Help 
121.Data File References 
122.Acronym Expansions 
123.Tutorial 
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124.FAQ 


Modified Features 


125.Discussion and Glossary Pages 
Graphics 


New Features 
125.Interactivity for BEGIN..END Graphs and Quick Graphs 
126.Hexagonal Binning (Manual and Automatic) 


127.GRADIENT and WIREFRAME Options for DENSITY, PLOT, PPLOT, and 
FPLOT 


128.Turning off Frame Titles 

129.Base Line for Anchor Bar 

130.Object Tracker for Identifying Individual Graph Objects 

131.Repositioning the Graph Title using the Mouse (Drag-and-Drop) 

132.Specifying Canvas Background and Borders 

133.Origin, Eye, Facet, Depth and Background Color through the Global Options 
dialog. 

134.Global Graph Options Through Dialog 

135.Setting the Image Type of Graphs in the Output 


Modified Features 
136.Enhanced Interactive Graphics with Single Dialog Interface 
137.Percentage Bar Charts 
138.Range for ETHICK 
139.Stacked Bar Chart 
140.Enhanced Highlight Point Tool 
141.Enhanced Graph Tooltips 
142.More Drawing Attributes 
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143.More Text Tool Font Options 
144,Enhanced Page View 


Deleted Features 
145.Some Graph Saving Options 
146.Importing Map Files with User Specified Names 
147,Some Dynamic Explorer Options 
148.Ruler in Page View 


STATISTICAL FEATURES 


New Features 
1. AIC and Schwarz's BIC 
Quade, Anderson-Darling (Nonparametric) Tests 
Multinormal tests 
LAD, M, LTS and S (Robust) Regression 
Partial Least-Squares Regression 
. Mixed Models 
. Response Surface Methods 
. Trend Analysis (Time Series) 


Ser aAwaRYWN 


Modified Features 
9. Resampling 
10. New Probability Distributions 
11. Basic Statistics (Column and Row) 
12. Crosstabulations 
13. Smart Correspondence Analysis 
14. Loglinear Models 
15. Association Measures 
16. Least Squares Regression 
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17. Logistic Regression 

18. Two-Stage Least Squares 

19. REGRESS, ANOVA, GLM - Assumption Checking 
20. ANOVA, GLM, MANOVA - Sums of Squares 

21. ANOVA, GLM, MANOVA - Contrasts 

22. ANOVA, GLM - Pairwise Comparisons 

23. Cluster Analysis 

24. Survival Analysis 


Descriptions for each of the above items are given in the following pages. 


GUI 


New Features 
1. Startpage 
The Startpage appears as the first tab in the Viewspace as you open SYSTAT 12. 
It has five panes: 
m Recent Files 
m Themes 
m Manuals 
m Tips 
m Scratchpad 
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Recent Files. This pane displays files opened in the current and previous sessions. 
The files are arranged in three categories: 


a Recent Data Files 
a Recent Command Files 
a Recent Output Files 


Each of these categories displays up to 9 of the most recently opened files. Open a 
file by double-clicking its name. 


Tips. SYSTAT displays useful tips for accomplishing various tasks. Click Next Tip 
to see the next tip. 


Themes. SYSTAT provides three different interface themes: Classic, Default and 
Introductory_Statistics. Each of these themes consist of a selection of menu items 
relevant to the theme that the name reflects. Double-click on a name to apply the 
theme. Or, from the menus, choose: 
Utilities 
Themes 
Apply Theme... 
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You can create your own themes, download themes, and apply them. For details, 
refer Chapter 7, Customization of the SYSTAT Environment. 


Manuals. This pane contains a list of the user manual volumes installed in your 
system. You can access a volume by double-clicking on its name. 


Scratchpad. This pane allows you to enter notes while working with SYSTAT. 
Anything that you enter here remains across sessions. 


. Variable Editor 


Corresponding to each data file, there are now two tabs (shown as Data and 
Variable at the bottom edge of the Viewspace) available in the Data Editor - one for 
editing data and another for editing variable properties. These will now be referred 
to as Data Editor and Variable Editor respectively. The Data Editor is the same 
(with a few enhancements; see 19) as the Data Editor in previous versions. 


The Variable Editor is variable based, and contains information about variable 
name (numeric/string), variable labels, type, categorical status, number of 
characters (this is the field width in the case of numeric variables), number of 
decimal places, display format (normal or exponential notation), date/time format 
(if any), and variable comments. With this, you can edit the variable properties of 
all variables in a single window. 

You can still edit individual properties of a variable. Unlike earlier, when you could 
double-click to invoke this dialog, you have to right-click its column and select 
Variable Properties. Now, double-clicking on a variable name's row number in the 
Variable Editor will take you to the corresponding column in the Data Editor and 
vice versa. 


If you define value labels for any variable, they appear as a tooltip when you pause 
the mouse on the corresponding cell in the Variable Editor. 


. Opening Multiple Data Files (View and Active Modes) 


The Data Editor allows for viewing multiple data files simultaneously. To open a 
file in the view mode, from the menus choose: 
File 

View Data... 
and select the desired data file to view. You can also do the same through the VIEW 
filename.syz command. You can also right-click a data file node in the Output 
Organizer and click View Data. 
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Data files that are saved by numeric modules are automatically opened in the 
Viewspace for viewing. 


Any data file that is in the view mode, or has a data file node created in the Output 
Organizer can be opened for editing conveniently. Simply right-click the 
corresponding tab in the Viewspace, or the desired data file node, and click Set as 
Active Data File. You can distinguish a viewed file from an active data file by the 
distinct coloring of the corresponding tabs in the Viewspace, and the shading of the 
Data/Variable Editor cells. 


By default, when you open a data file through the File menu, the currently active 
data file closes down d after prompting you to save unsaved changes, if any. There 
is a global option using which you can set the currently active data file to switch to 
the view mode instead of closing down. 


To close a data file that is in the view or active mode, right-click on its tab and 
select Close. 


Creating a New Data File 


You can create a new data file by double-clicking in the empty area next to the tabs 
of the Viewspace. 


. Opening Multiple Graphs (View and Active Modes) 


You can view multiple graphs simultaneously. Simply right-click a graph node in 
the Output Organizer and select View Graph. 


By default, when you generate a new graph, the currently active graph closes down. 
There is a global option using which you can set the currently active graph to 
switch to the view mode instead of closing down. 


To close a graph that is in the view or active mode, right-click on its tab and select 
Close. 


. Examples Tab 


The Examples tab appears in the Workspace and contains a SYSTAT Examples tree 
that has nodes corresponding to the examples appearing in the user manual. The 
nodes are arranged in the following folders: 


m Application Gallery 
m Demonstrations 
m Data 
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m Graphics 
m Statistics 
m Command Templates 


These in turn may contain sub-folders corresponding to the respective chapters in 
the user manual. (The structure essentially mimics that of the SYSTAT Command 
folder.) Click the + sign beside a folder or sub-folder to see the examples therein. 
Double-click an example node to execute the corresponding command script and 
view the outputs in the Output Editor. You can run multiple examples by selecting 
them using the Shift or Ctrl keys, right-clicking, and clicking Run. You can 
similarly run one or more folders of examples. You can open the underlying 
command file(s), for editing or selective execution in the Commandspace, using 
Show script in the context (right-click) menu. 


You can even add your own examples to the Examples tab using the Add Examples 
dialog accessible from the Utilities menu. For details, refer Chapter 7, 
Customization of the SYSTAT Environment. 


. Opening Multiple Command Files 


Multiple command files can be opened in the Commandspace. With this, any 
number of command files can be edited simultaneously. A batch tab is created for 
each file that you open. Commands can be executed from any of these and in any 
order. Right-click a tab and click Close to close the file, or click Close All to close 
all files. A prompt appears to save any unsaved changes. 


Saving Command Files in the ANSI Format 


By default, command files are saved with Unicode encoding that may not be 
readable in all text editors. If you need to, you can save command files in the ANSI 
format, i.e., with Windows encoding by selecting the appropriate file type in the 
Save dialog. 


. Autocomplete Commands 


Commands will be "autocompleted" as they are typed in the Interactive or batch 
(Untitled) tab of the Commandspace. When a letter is typed, all commands 
beginning with that letter will appear in a dropdown list. Double-click the desired 
command or continue typing. On pressing space and then any letter, for the USE 
and VIEW commands, the data files in the SYSTAT Data folder will be listed. For 
any other command, if a data file is open, all available variable names beginning 
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with that letter will appear in a dropdown list. Double-click the desired file or 
variable name. 


Command autocompletion is enabled by default. You can turn it off by unchecking 
Autocomplete commands in the Edit: Options dialog. 


10. Command Coloring 


The commands, variable names, numbers, strings and comments (REM statements) 
that you type will be colored in distinguishing colors. The colors are as follows: 


Commands Blue 
Variable names Violet 
Numbers Pink 
Strings in double quotesPink 
Comments Green 


Coloring makes it easy for you to identify the various components of a command 
line thereby reducing the risk of making syntax errors. 


11, Active Tab at the Beginning of Viewspace/Commandspace 


By default, the tabs of the Viewspace are arranged in the following order: 

@ Startpage 

m Output Editor (untitled.syo or the name of the output file you save/open) 
m Active Data File (untitledn.syz or the name of the data file you save/open) 
m Graph Editor 

m Viewed Graphs 

m Viewed Data Files 


You will learn about Active and View Files in the Data section (sl. no. 33). When 
a new tab is opened, it is inserted in the beginning of its group. You can click the 
arrow in the top right corner of the Viewspace and check [Active Tab at the 
Beginning] if you want a new tab to appear as the first tab of the Viewspace. You 
can bring a tab into focus by clicking the arrow and checking the name of the 
desired tab. If there are more tabs than are directly visible in the Viewspace, the tab 
becomes the first tab in the Viewspace or in its group depending on whether [Active 
Tab at the Beginning] is checked or not. This is especially useful when you have a 
lot of tabs open in the Viewspace. 
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A similar feature is available in the case of the Commandspace. Refer Chapter 2, 
Introducing SYSTAT for details. 


12. Interlinked Command Line and Dialog Interface 


The command line and graphical user interfaces are now interlinked. That means, 
any commands that are typed are automatically translated by any dialogs that 
depend on these commands. For example, you can now do an analysis of variance 
using commands and then open the ANOVA: Hypothesis Test dialog to test your 
model. 


13. Closing all Open Files 


You can open any number of data files, command files, and graphs in the 
Viewspace. This can accumulate a lot of tabs in the Viewspace. You can close a 
single tab, all data files in the view mode, or all graphs in the view mode. Right- 
click a tab and click Close to close the underlying file. Right-click on a data file tab 
click Close All to close all data files that are currently in the view mode. Right- 
click on a graph tab and click Close All to close all graphs that are currently in the 
view mode. 


14. Quick Access 


SYSTAT provides some basic statistical analyses and exploratory graphical 
techniques of widespread use in a separate menu named Quick Access. 
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One-Way Frequency Tables... 
Pie Chart... 
Bar Chart... 


Basic Statistics... 
Histogram... 
Probability Plot... 


Two-Way Tables... 


One-Sample KS... 
Paired t-Test... 
Two-Sample t-Test... 


Scatterplot.., 
Correlation... 

OLS Linear Regression... 
ANOVA... 


Time Series Plot... 


These items are also available in the Analyze and Graph menus. The items chosen 
here are those used frequently by most users. This menu also gives you a hint about 
how you can customize the menu system so as to access all that is required from a 
single menu. 


15. Window Menu 


The pages of the Viewspace are by default tabbed. SYSTAT provides a Window 
menu to arrange tabs of Viewspace in a few different ways. 


Cascade 


Tile 
Tile Vertically 
Arrange Icons 
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To tile the Graph Editor and Data Editor, close the other pages using the View 
menu. Then, to tile the two horizontally, from the menus choose: 
Window 

Tile 
Alternatively, you can Cascade or Tile Vertically. This facility can also be used to 
view two data files (or two views of the same data file), a data file along with an 
output file, and such other useful content. For more details, refer Chapter 7, 
Customization of the SYSTAT Environment. 


Interface Themes 


You can customize the default menu or, in general, any given menu to suit your 
own liking and need. This will create a new "Interface Theme". You may then save 
this theme by clicking from the menus: 
Utilities 

Themes 

Save Current Theme... 

The theme will be saved as a file with a .systheme file extension and, if you save it 
in the SYSTAT Themes folder, it will get listed in the Themes pane of the 
Startpage. You can then apply this theme any time you need to. You can also 
download themes from the Internet from time to time. For more details, refer 
Chapter 7, Customization of the SYSTAT Environment. 
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17. Case Selection Tools 


Various tools related to case selection are available on the Data toolbar and menu. 
When you do a case selection (Data: Select Cases dialog or SELECT command), 
you have the option to invert the case selection. To do this, from the menus choose: 


Data 
Invert Case Selection 


or click the Invert Case Selection tool on the Data toolbar. 


When one or more cases are selected, you can navigate through the selection row- 
by-row. To do this, use the First Case in Selection, Previous Case in Selection, 
Next Case in Selection and Last Case in Selection tools on the Data toolbar. Or, 
from the menus choose: 

Data 


First Case in Selection, Previous Case in Selection, Next Case in Selection or Last 
Case in Selection. 


18. Crash Recovery and Rescue System 


In the event of an abnormal termination of the application, SYSTAT attempts to 
rescue the command log, active data file and output file. A Rescue Report dialog 
pops up with the paths to the rescued files. Click Send Report, attach the files to 
the mail message that is automatically generated, and send the message. The 
SYSTAT team will use the rescued files to reproduce the problem and fix it. 


When you open SYSTAT the next time, it displays a Systat File Recovery System 
dialog that gives you the option to save the open data files from the previous 
session. This allows you to retrieve any unsaved changes to your data files. 


19. Windows XP Style Interface 


SYSTAT 12 has a Windows XP style interface. That means, it adopts the current 
Windows XP theme that is applied for the Windows Desktop. 


Modified Features 


20. Appearance of Data Editor 


The Data Editor has a few enhancements in terms of the appearance of data values, 
value labels and variable comments: 


m The cells of the header row and column corresponding to the current Data Editor 
cell (cursor position) are highlighted. If two or more cells are selected using the 
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keyboard and mouse, the corresponding header row and columns are 
highlighted. 

m If cases are selected using the Select Cases dialog or SELECT command, the 
corresponding rows are colored green. If "green" rows are selected using the 
keyboard and/or mouse, such cells are checked. 

m The Standard toolbar contains a tool to display and edit the current Data Editor 
cell entry. This may be useful if you want to identify where the Data Editor 
cursor is in a filled-up grid, or to view/edit a value in isolation from the others. 

m When a variable is declared as a frequency or weight variable, appropriate icons 
are displayed in the corresponding variable name in the header row. 

m When value labels are defined for a variable, the Data Editor allows you to view 
the value labels instead of data values in the corresponding column. 

m When variable comments are defined for a variable, you can view the comments 
by pausing the mouse on the corresponding variable name in the header row. 


Data File Comments in Data Editor 


You can write/edit file comments for a data file using the Data Editor. Simply right- 
click the corresponding tab for the data file, or the top-left corner cell of the 
Data/Variable Editor and select File Comment. The File Comment dialog pops up 
where you can type the desired comments. The comments can be seen by placing 
the cursor in the top-left comer cell of the Data/Variable Editor. Through 
commands, until now, you could only add file comments by appending the desired 
string to the DSAVE command as an option. The file comments that you specify will 
get saved when you save the data file. 


Up and Down Arrow Keys 


In SYSTAT 11, the F9 key was used to recall commands processed earlier. You can 
now use the up and down arrow keys to navigate through the processed commands 
(up to go to the previous command, and down to go to the next). F9 will also work 


as before. 
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25. 


Default Menu Structure 


The default menu has been reorganized by rearranging and recaptioning menu 
items to provide easier access to the more common items in a given menu. The 
terminology used is more general and direct so as to be understood by any class of 
users. Some of the items that were only available only through the toolbar, like 
Recent Dialogs and Submit from File List, have also been accommodated in the 
menu. Likewise, some items that did not appear in any toolbar, like Full Screen 
Viewspace, Customize and Add/Delete/Modify, now appear in the appropriate 
toolbar. 


The classic menu structure used in version 11, is available as a theme that you can 
apply from the Startpage or Utilities menu. 


Menu Customization 


You can now directly create popup menus in the Menu Bar. To do this, from the 
menus choose: 
View 
Customize... 
Or, click the Customize tool on the Standard toolbar. 


Click the Menu tab, enter a name for the popup menu and press Create. 


The new menu gets added as the first item in the Menu Bar. Drag and drop the 
menu to whatever location you want it to be in. 


Yet another enhancement relates to User menu items. Previously, the items created 
under the User menu had to remain in the Menu List sub-menu of User Menu. Now, 
you can drag and drop these items to any location of your choice. 


You can also specify the Status Bar help, tooltip and Bubble Help for User menu 
items using the Add/Delete/Modify dialog. 


Status Bar 


The status bar of SYSTAT has been enhanced significantly to include switches for 
various global options and processing conditions. For example, the switches 
QGRAPH, HTM and ECHO denote the current global setting for appearance of 
Quick Graphs, mode of output, and echoing commands. You can click any of these 
to toggle the setting; the caption enables or disables accordingly. There are also 
switches for processing conditions in effect, like Frequency (FRQ), By Groups 
(BY) and Case Selection (SEL). Pausing the mouse on any of these displays the 
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current declaration, if any. Clicking any of these pops up the corresponding dialog 
box where you can change the declaration. 


See Chapter 7, Customization of the SYSTAT Environment for a complete list of 
items on the status bar. Of the status bar items available in SYSTAT, the QGRAPH, 
HTM, ECHO, SEL, BY, WGT, FRQ, ID, CAT and OVR items appear by default. 
You can add or remove items from the status bar by right-clicking on it. In the 
context menu that appears, check the items you want to keep and uncheck the items 
you do not use. You can get all the items to appear by selecting All Items; all the 
items will disappear if you select No Items. To revert to the default set of items, 
select Default Items. 


Additional Shortcut Keys 


Several new keyboard shortcuts have been introduced. For example, you can press 
F6 to invoke the Edit: Options dialog, Ctrl+L to submit the current command line, 
and F4 to invoke the Customize dialog. See the section on Keyboard Shortcuts in 
Chapter 7: Customization of SYSTAT Environment, for a complete list. 


Context Menus 


SYSTAT provides several context menus that pop up on right-clicking in various 
components (tabs or nodes in the three spaces) of its interface. This gives you 
convenient access to some actions that you would otherwise have to perform 
through other means. For example, you can right-click on a data file tab in the 
Viewspace, and click Close, to close the data file. See the section on Menus in 
Chapter 2: Introducing SYSTAT, for a complete list of menu items. 


Dialog Boxes 
The dialog boxes of SYSTAT have been enhanced as follows: 


a Ifa variable label is defined for a variable, it will appear as a tooltip when you 
pause the mouse on the variable name in the variable lists appearing in dialog 
boxes. i 

m Ifa variable is declared categorical, an icon'will appear to indicate that it is 

jx SAE a 


categorical. : 


m Grid controls have been provided to take any number of rows in dialog boxes 
requiring an indefinite amount of inptit. For example, you.can specify any 
number of IF conditions in the bE 


If ...Then Let dialog. Likewise, you can specify value'labels for any number of 
data values in the Value Labels dialog. 
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Deleted Features 

29. FEdit 
FEdit is no longer available in SYSTAT. However, you can get all the functionality 
that it provided through the Commandspace and external command editors. Note 
that the Commandspace now allows multiple command files (see 6), and supports 
unicode as well as standard encoding (see 7). You can also use the Scratchpad in 
the Startpage to type commands that you can then copy and submit. 

30. Printing Command Files 
You can no longer print command files through SYSTAT. Use an external text 
editor like Notepad to do the same. 

31. Saving from the Interactive Tab of the Commandspace 


8 


N 


You can no longer save commands that you type in the Interactive tab of the 
Commandspace. However, since the commands that you type here are immediately 
executed anyway, you can save the command log after execution of these 
commands. 


. Opening .CMD (Command) Files 


The .cmd file extension is no longer recognized as a SY STAT command file. 
Rename any such files with .syc extension so that they can be directly opened 
through double-click. 
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New Features 
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New Data File Format with Compression 


SYSTAT 12 has a new default data file format, with the extension .SYZ. This 
format achieves file compression and will store variable labels, value labels, 
category, frequency, weight and ID variable information. For backward 
compatibility, you still have the option of saving in the SYSTAT data file format 
suitable for versions 8 through 11, and also the older .SYS format. The above 
additional information will not be saved though. 


If you do need to save data in the older file format, and you have value labels 
defined, you can use the new GENLAB = filename.syc option of the NAMES 
command to generate a command file containing the corresponding LABEL 
commands to recreate the value label information in an older version of SY STAT. 


Date AND Time Format for a Date-Time Variable 

There is now a new display format for date variables which allows you to display 
values with both the date and time written together. Any given valid data format 
can be combined with any given valid time format. 

Processing Conditions in Effect Pane 

The Variable Editor has a pane showing the processing conditions currently in 
effect, like By Groups, Weight, Frequency, Category and Order variables, and Case 
Selection. 

Copying Data as Text 

You can select one or more cells in the Data or Variable Editor and copy the content 
to the clipboard as plain text. You can thus copy just about any content that you 
desire, without any formatting. For example, you can set a column in the Data 
Editor to display value labels instead of data values, select part or whole of the 
column, right-click on it, and select Copy as Text. The value label entries get 
copied to the clipboard. 

Pasting Data with Custom Row and Column Separators 

You can copy data from any source to the clipboard, specify the characters used as 


row and column separators in that data, and then paste the data to the Data Editor. 
By default, SYSTAT assumes the Tab character as the column separator and Enter 
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as the row separator. Copy the data, right-click the desired cell of the Data Editor, 
and select Paste with Options. Check the column and row separators used, check 
Other and enter any custom separators you may have used, and press OK. 


38. Pasting Variable Properties 


You can paste the properties of one variable onto another variable using the context 
menu for columns of the Data Editor or rows of the Variable Editor. Copy the 
desired column(s) of the Data Editor, or row(s) of the Variable Editor, right-click 
the desired location and select Paste Properties. 


39. Variable Statistics 


You can see summary statistics of a numeric variable, and a histogram, by right- 
clicking on the Variable name in the Data Editor, and selecting Variable Statistics. 
You can click Send to Output in the window that pops up, to send the statistics to 
the Output Editor. 


40. Undo and Redo of Data Modifications 


Undo facility is provided in the Data Editor. You can undo up to a maximum of 32 
recent changes in the same session. Any changes you make in the Data Editor can 
be undone through this facility, for example, manual editing of data, Find and 
Replace, commands like LET, RANK, SORT, etc. 


In the case of variable properties, undo/redo will work only for the changes made 
to the variable name and type. Note that undo/redo will not work for any 
modifications in data processing conditions like Category, By groups, Select cases, 
etc. 


41. Recoding Variables 


You can recode variables using the Data: Transform: Recode dialog box or 
RECODE command. You can either replace the values of an existing variable or 
create a new variable with the recoded values. You can simultaneously recode more 
than one variables. See 65 to know about the RECODE command. See Chapter 6: 
Grouping Variables and BY Groups of the Data Volume for more details about this 
feature. 


42. Trimming Data 


Data on chosen variables can be subjected to a user-chosen percentage trimming 
(deletion of cases) of extreme values on one chosen side or on both sides. You can 
do this using the Data: Trim dialog or using the TRIM command. 
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43. Centering and Stacking Data Through the Dialog Interface 


The Center feature in the Data menu applied to selected variables transforms the 
columns of the original data to the original value minus the mean of the original 
data values of that column. This results in modified columns for these variables 
with the arithmetic mean of each columnas zero. In prior versions of SYSTAT, you 
could only achieve this through the CENTER command, you can now do it through 
the Data: Center dialog box. 


The Stack feature in the Data menu applied ona set of variables in a data set results 
in anew file where the columns of the variable values are stacked one on top of the 
other in the order in which they are chosen. . In prior versions of SYSTAT, you 
could only achieve this through the STACK command, you can now do it through 
the Data: Reshape: Stack dialog box. 


44. Trimming Leading and Trailing Spaces in String Data 


cS 


SYSTAT allows you to enter string data values with leading and trailing spaces. 
Some of SYSTAT's character functions, namely LFT$, CNT$, RGT$, LPD$ and 
RPD$, also insert leading and/or trailing spaces in string values. Check this option 
if you want such spaces to be trimmed out for processing by the LABEL, ORDER 
and RECODE commands. 


Modified Features 
45. Field Width and Decimal Places for Numeric Data 


The maximum field width, i.e., the total number of digits in a data value, including 
the decimal point and the digits after the decimal point, is now 23, and the number 
of decimal places is 14. 

46. Number of Characters for String Data 


The value taken by a string variable can be up to 256 characters in length. 
Consequently, the width of a string is assumed to be 256 while applying the 
character functions LPD$ and RPDS. 


47. Variable Name 
The length of a variable name can be up to 256 characters. 
48. Variable Labels 


Prior versions of SYSTAT allowed a user to define variable labels using the 
VARLAB command, and these were reflected in the output of the STATS module. 
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They were limited to a length of 12 characters. Variable labels can now be defined 
to be up to 256 characters in length, and are reflected in the output of all graphs and 
numeric modules. You can define variable labels using the VARLAB command, the 
Variable Labels column in the Variable Editor, or the Variable Properties dialog. 
These labels are saved in the data file. 


You can control the display of variable labels in the output using the VDISPLAY 
command. Or, from the menus choose: 

Edit 

Output Format 
Variable Label Display 

Select either (variable) Label, (variable) Name, or Both. If you select Both, the 
output will display "variable label (variable name)". You can also set this in the 
Output tab of the Edit: Options dialog. 


49. Variable Properties Dialog Box 


With the Variable Properties dialog box, you can navigate through the variables in 
the data file while setting its properties. You can suppress automatic saving of any 
changes you make, if you just need to view the properties. 


This dialog box also allows you to specify the variable label, and set the variable 
as categorical. 


50. Print Preview of Selected Data 


You can select a part of the data in the Data or Variable Editor, and click Print 
Preview in the File menu to preview just the selected data before printing. You can 
then print the selected data using the Print button in the Print Preview window. 


51. Importing/Exporting Data Files 


You can import from or export to higher versions of external data files than was 
possible in SYSTAT 11. For details, click the Open dialog corresponding to data 
files, and see the Files of type dropdown list. 


52. Pasting Data (Overwriting) through Menus 


In prior versions of SYSTAT, the PASTE command allowed you to overwrite 
existing data columns. You can now achieve the same using the context menu for 
Data Editor columns or Variable Editor rows. Copy a Data Editor column or 
Variable Editor row, and select Paste Data. 
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53. Inserting Variables from the Clipboard Directly 


54. 
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In prior versions of SYSTAT, the Paste feature in the Data Editor always inserted 

variables, but only after requesting user input regarding the variable name to be set. 

You can now insert variables directly, the variable name being automatically set as 

the original name appended by "_1" or a higher number as appropriate. Copy a 

Data Editor column or Variable Editor row, and select Insert Variable(s) from 

Clipboard. 

Data Editor Accessible During BASIC, MATRIX and RANDSAMP 

In prior versions of SYSTAT, the Data Editor was disabled during BASIC, 

MATRIX or RANDSAMP operations. It will now remain accessible during these 

operations as well. 

Merging Data Files 

m This feature can now be used even if the files are not sorted by key variables. 

m This feature compulsorily requires two data files. 

m When files are merged, both variable properties and data processing conditions 
are carried over. 


m InSYSTAT 11, if two files that are merged contained a common variable, only 
the contents of the first file of the common variable appeared in the merged file. 
In SYSTAT 12, contents of both the files appear in the merged file with file 
names attached to variable names. For instance, if filename! and filename2 
contained height, in the merged file the variable names will appear as 
height_filename! and height_filename2. 


. Appending Data Files 


When files are appended, both variable properties and data processing conditions 
are carried over to the appended file. 


. Value Labels 


You can now give labels to variable values through the Variable Editor. These 
labels are saved in the data file, and appear in the output by default. You can control 
the display of variable labels in the output using the LDISPLAY command. Or, from 
the menus choose: 
Edit 
Output Format 
Value Label Display 
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Select either (value) Label, Data (value), or Both. Ifyou select Both, the output will 
display "value label) variable name". You can also set this in the Output tab of the 
Edit: Options dialog. 


In prior versions of SYSTAT, the appearance of value labels in the output was 
contingent on the variable being declared categorical. This restriction has been 
removed. That is, even non-categorical variables can be labeled. 


. Ordering Categories 


In prior versions of SYSTAT, by default, the ordering of categories was based on 
data values when no value labels were defined. And, it was based on the value 
labels when these were defined. Now, by default, the ordering of categories will be 
always based on data values. Use the LABEL option of the ORDER command, or 
check Labels under Sort applies to in the Data: Order of Display dialog, to sort 
categories based on value labels. 


Deleted features 


Commands 


59. Data Aggregation using Value Labels 


In prior versions of SYSTAT, defining value labels appropriately allowed you to 
aggregate values into a smaller number of categories. This is no longer possible. 
Use the Recode feature (39) to achieve data aggregation. 


60. Saving in the SYSTAT .SYS Format 


The SYS format for data file is no longer supported. Such files can be read though 
not written over. 


61. Importing/Exporting BMDP Data Files 


Importing/Exporting BMDP Data Files is no longer supported. 


New features 


62. !! (for Inserting Comments at the End of a Command Line) 


In SYSTAT 12, you can use double exclamation marks (!!) to comment a line in 
command scripts. Apart from having the same functionality as the REM command, 
you can also use !! to insert comments at the end of a command line. 
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63. NAMES (GENLAB Option) 


The NAMES command has a GENLAB option by which you can generate a set of 
LABEL commands in the output corresponding to the value labels defined in the 
data file. GENLAB also allows a filename as the option value to generate a 
command file containing the corresponding LABEL commands. This is particularly 
useful if you need to recreate the value label information in an older version of 
SYSTAT. 


The DICTIONARY option for the NAMES command has been enhanced to list in 
greater detail all the properties of all variables in the data file, like whether a 
variable is Categorical, By, Frequency, Weight etc. This option will also display 
file comments, if any. 

64. RNDGEN 
RNDGEN allows you to set the random number generator using the command line 
interface. A global option already exists in the General tab of the Edit: Options 
dialog. RNDGEN WH sets the Wichman-Hill algorithm, RNDGEN MT sets the 
Mersenne-Twister algorithm which is the default. 

65. VDISPLAY 

Use this command to control the display of variable label information in the output. 

See 46 to know about the utility of this command. 

66. LDISPLAY 

Use this command to control the display of value label information in the output. 

See 55 to know about the utility of this command. 

67. RECODE 

Use the RECODE command to recode existing variables or to create new variables. 

See 39 for details about the Recode feature. 

68. TRIM 


Use the TRIM command to trim data off a user-specified percentage of extreme 
values on a chosen side or on both sides. See 40 to know about the Trim feature. 


69. Temporary Variables 


The variables which are created in memory temporarily for doing data-related 
operations are called temporary variables. You can assign either numeric or string 
values to these temporary variables without opening a data file. Like data variables 
of string type, temporary string variables also should end with dollar sign ($). In 
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70. 


SYSTAT 12, assigning a temporary variable to a data variable is possible, but the 
reverse is not possible without the help of DATA or DATAS functions. You can clear 
these variables from memory using CLEAR VARIABLES command 


Syntax 


var =value 


Example 


NEW 
REPEAT 1 
x=5 


LET z=x +y 

w = DATA (z, 1) 

a$ = "SYSTAT" 

b$ " IS GOOD" 

LET c$ = CAT$ (a$, b$) 
PRINT z, w, c$ 


Temporary Arrays 


SYSTAT 12 provides a new facility to create temporary subscripted variables. 
Now, you can create a subscripted variable of required size in the memory 
temporarily with the ARRAY command. You can clear these variables from memory 
using the CLEAR ARRAYS command. 


Creating an array of numeric type 


Syntax 


ARRAY var (size) 


Example 


ARRAY x(5) 

FOR k =1 TO 5 
x(k)=ZRN(0,1) 
PRINT x(k) 

NEXT 

CLEAR ARRAYS = x 
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Creating an array of string type 
Syntax 


ARRAY var(size)$ 


Example: 


NEW 

ARRAY x(3)$ 

FOR i=1 to 3 

k$= 'SBU' 

x (i) =MID$ (k$,i,1) 
PRINT x(i) 

NEXT 

CLEAR ARRAYS=x 


71. IF .. THEN .. ENDIF 


You must compulsorily issue an ENDIF command to execute IF statements that 
extend beyond a single line. That means, multi-line IF statements are cold 
commands, and ENDIF is the HOT command that triggers execution. 


For example, 
IF CASE < 50 THEN LET XVAR = 1 


will execute instantly. The following commands will execute when the ENDIF 
command is encountered: 


IF CASE < 50 THEN 
LET XVAR = 1 
ENDIF 


72. BEGINBLOCK .. ENDBLOCK 


BEGINBLOCK indicates the beginning of a block of statements. These will be 
executed when ENDBLOCK is given. This is especially useful in IF THEN ELSE 
loops. See 90 for one use of this loop. 


73. WHILE...ENDWHILE 


This works similar to the FOR...NEXT loop. It goes with a condition and a series of 
commands in its loop. The commands in the loop are executed so long as the 
condition is satisfied. Only temporary variables are allowed to use in the condition. 
But inside loop you can use both data and temporary variables. 
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Syntax 


WHILE (condition) 
commands 
ENDWHILE 


Example 


a=1 

WHILE (a <=5) 
PRINT a 
a=atl 
ENDWHILE 


NEW 

REPEAT 1 

DEFVAR x / TYPE = NUMBER 
LET x = 0 

k=l 

WHILE (k <= 5) 

LET x = x+k 

k= k +1 

ENDWHILE 


74. FUNCTION .. RETURN (to Create User Defined functions) 


You can define your own functions (so long as the function names are different 
from SYSTAT functions or other existing functions) and can use them during the 
current sessions in other commands. In this version, you can define only those 
functions which return numeric values but not string values. Within a function, 
only temporary variables are allowed for any manipulation. You can pass 
temporary numeric variables or numeric constants as argument values while 
calling that function. 


Syntax 


FUNCTION funcname(argumentlist) 
{ 

commands 

RETURN expression 

} 
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Example 


FUNCTION myern (n) 
{ 

sum=0 

FOR i=1 to n 
sum = sum + ERN(0,1) 
NEXT 

RETURN sum 

} 

NEW 

FOR i = 1 to 10 
w=myern (i) 
PRINT w 

NEXT 


75. DATA and DATAS Functions (to Identify Individual Data Values) 


DATA function returns a numeric value which it reads from a specified location of 
active data file. DATA$ function returns a string value which it reads from a 
specified location of active data file. 


Syntax 


DATA( numvar, rownum) 
DATA$(strvar, rownum) 


Example 


USE ourworld 

k = DATA(urban, 10) 

r$ = DATA$( country$, 5) 
PRINT k, r$ 


76. LEN Function (to Determine Length of a String) 


This function returns a numeric value which gives the length of a given data value 
or a temporary string variable or a string constant. 


Syntax 


LEN( strvar or strconst) 


Example 


USE ourworld 

LET k = LEN(country$) 
r$ = 'WORLD' 

p = LEN(r$) 

PRINT p, LEN ( 'WONDER' ) 
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71. NCAT Function (to Determine Number of Categories) 


This function returns a numeric value which gives the number of categories in a 
given data variable. 


Syntax 


NCAT(argl, arg2) 


arg] is a data variable and arg2 is an optional argument which can take a value of 
Oor 1. Default value is 0; if you wish to include missing value as a category, then 
use 1 as its value. 


Example 


USE afifi 

IF sysincr < 40 THEN LET disease = . 

x= NCAT (disease, 1) 

PRINT 'The no of categories in drug is', NCAT (drug) 
PRINT 'The no of categories in disease is', x 


78. Multicase Functions 


Multicase functions like CMIN, CMAX, CMEAN, CSUM, CAVG, etc., which worked 
inside the MCMC feature in SYSTAT 11 are now globally available in SYSTAT 12. 
Syntax 
functionname(varexpression, startcase, endcase, 


increment) 


where startcase, endcase, increment are optional with default values 1, total cases 
and 1 respectively. 
Example 


USE afifi 
LET x = CMIN (SYSINCR, 43, 58) 


79, Reserved Keywords 


Some words are predefined in SYSTAT to perform a specific task. These are called 
reserved words or key words. You cannot use these words as variable names. 


Global key words are: INPUT, LET, FOR, NEXT, IF, THEN, ELSE, ARRAY, DIM, 
PRINT, SELECT, WHILE, EOF, BOF, EOG, BOG, CASE and COMPLETE. 
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There is no longer any restriction on using function names as variable names. For 
instance, you can use LET log = a + log(5) to create a variable named LOG. 


Modified features 


80. 


8 


= 


82. 


83. 


Command Syntax Rules 


The command syntax has been streamlined across features, and informative error 
or warning messages are provided whenever a syntax is in error. Refer the 
Command Syntax Rules section in Chapter 5: Command Language to know about 
certain rules to be followed while typing commands. 


. BASIC, MATRIX and STATS Modules Made Global 


In SYSTAT 12, the BASIC, MATRIX and STATS modules have been made global. 
What this means is that you can use all the commands of these three modules 
directly anywhere in the program. 

PRINT and PLENGTH 

There were two PRINT commands in SYSTAT 11: PRINT for output length, and 
PRINT in BASIC. The PRINT command which was intended to choose the length 
of output has been changed to PLENGTH. 

The PRINT command displays both numeric and string constants, values of both 
variables and values of the numeric expressions in the output. The list must be 
separated by commas (,). 


PRINT constants or varlist or expression 


For example, 


USE rainfall 

PRINT 'The values are of y are:' 
PRINT y 

a=3 

b=4 

P$ = 'See:' 

PRINT p$,1, SQR (b), a, 2+1+1 
PRINT 'The result is: ', at (b*2) 


FPATH (PROJECT and GALLERY Options) 


You can set paths to various types of SYSTAT related files using the FPATH 
command. Two new options have been added to this command: PROJECT and 


GALLERY. 
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When the PROJECT option is given, there will be subfolders appended to the 
project directory path. The sub-folder names are \Data, \Command, \Gallery, 
\Graph, \Output, and \Temp. 


The command FPATH 'c:\ Store’, on the other hand, sets a common directory for all 
the file types. (This is equivalent to checking Common directory in the File 
Locations tab of the Edit: Options dialog.) This is what SYSTAT 11 defined as 
project directory, in contrast to the above definition followed by SYSTAT 12. 


84. DEFVAR 


SYSTAT 11 defined display format of variables using the DISPLAY = m.n option of 
the DEFVAR command. Here, m is the field width and n is the number of decimal 
places, and the two are separated by a dot. In SYSTAT 12, the dot has been replaced 
by a comma. 


85. INSERT 


The INSERT command is used to insert columns(variables) and rows(cases) in a 
data file at a specified location. 


INSERT loc, count /COLUMNS, NAMES = 'name!', 'name2',... 
INSERT loc, count / ROWS 
Here, loc is the number for location at which to insert blank rows or columns. 


count is the number of rows or columns to insert. Previously count was optional, 
and it would take a default value of 1. Specifying loc is now compulsory. 


86. PASTE (After Running CUT Command) 


There are two types of PASTE commands in SYSTAT, one that follows the CUT 
command, and another that follows the DELETE command. In prior versions of 
SYSTAT, the PASTE that follows the CUT command would replace existing 
variables, if any. Now, PASTE will insert new variables before existing variables, 
if any. 


For example, 


USE AFIFI 

CUT DRUG 

PASTE 2 

In SYSTAT 11, PASTE replaces the variable SYSINCR with DRUG. 
In SYSTAT 12, PASTE inserts the variable DRUG in the second 
column. 
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DELETE (for ROWS and COLUMNS) 


In SYSTAT 11, with the DELETE command you were able to do only row-related 
operations. In SYSTAT 12, this command has been changed to DELETE ROWS and 
DELETE COLUMNS so that you can do both row- and column-related operations. 


DELETE ROWS = rownumlist 
DELETE COLUMNS = varlist 


For example, 


USE RAINFALL 

DELETE ROWS = 2, 5 

USE SICKDATE 

DELETE ROWS = 5..10 

USE AFIFI 

DELETE COLUMNS = DRUG, DISEASE 

USE OURWORLD 

DELETE COLUMNS = POP_1983..POP_2020 


The DELETE COLUMNS command thus works like the DROP command which is 
also supported. 

SAVE and WORK 

In Matrix, SAVE and WORK have been replaced by MSAVE and MWORK. 


In Basic Statistics, SAVE and WORK have been replaced by SSAVE and SWORK. 
The options VARIABLES and AGGREGATE are available. 


In SYSTAT 11, SAVE and WORK were global commands, but in SYSTAT 12 they 
will function only within a module. 

EXTRACT (VARIABLES Option) 

In SYSTAT 11, you could select a subset of variables in a data file by using the 


MERGE command on a single data file. To do the same in SYSTAT 12, use the 
EXTRACT command as follows: 


EXTRACT filename / VARIABLES = varlist 


This gives added functionality to the EXTRACT command, as you can now select a 
subset of cases as well as variables with it. For example, 
USE ourworld 


SELECT BIRTH_RT < 40 
EXTRACT c:\population / VARIABLES = COUNTRYS, POP_1983 
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Earlier, the EXTRACT command only worked when SELECT was active. Now, there 
is no such restriction. 


90. CATEGORY 


When category variable information is stored in the data file, it is likely to cause 
conflicts with temporary category declarations made for a particular analysis. 
Hence, a few new options have been introduced for this command as given below: 


CATEGORY varlist / option 


/ ADD Adds the variables in varlist to the existing list of categorical variables. 

This is the default declaration method in SYSTAT 12. 

REPLACE Replaces all existing category variables with varlist. This was the default 
declaration method in SYSTAT 11. 

MISS Allows cases with a missing value for the categorical variable to be included 
as an additional category. 

NOMISS Does not include cases with missing values while defining categories. 
This is the default treatment for missing values. 

OFF Removes the variables in varlist from the existing category list. 
Removes all variables if varlist is not specified. The second action is 
equivalent to CATEGORY in SYSTAT 11. 


? Lists all categorical variables. Use this option without giving varlist. 


For example, when the following set of commands is executed: 


ANOVA 

USE ANOVA09 
CATEGORY DV2 
CATEGORY DV3 
DEPEND Z1 
ESTIMATE 


SYSTAT 11 considers only the last variable, i.e. DV3, as categorical. 


SYSTAT 12 considers both DV2 and DV3 as categorical variables. You have to use 
the REPLACE option if you want to mimic SYSTAT 11 behaviour. 


91. FOR ... NEXT 


In SYSTAT 12, only temporary variables can be used in the FOR condition of a 
FOR ... NEXT loop. But inside the loop, you can use both data and temporary 
variables. 

FOR indexvar = startindex TO endindex STEP 

expression 


commands 
NEXT 
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In SYSTAT 12, the default value of STEP expression is |when startindex is less 
than endindex and -1 when startindex is greater than endindex. For example, the 
commands 

k=5 

FOR i = k to 1 


PRINT i 
NEXT 


will print 5, 4, 3, 2, and 1 one below the other. 
Conditional FOR ... NEXT 
Conditional FOR ... NEXT in SYSTAT 11 had the following syntax: 


IF condition THEN FOR 
commands 
NEXT 


This has been replaced by the following syntax: 


IF condition THEN BEGINBLOCK 
commands 

ENDBLOCK 

ENDIF 


For example, 


USE AFIFI 

c= 

IF DRUG=DISEASE then BEGINBLOCK 
ecceri 

PRINT c, '- ', SYSINCR 
ENDBLOCK 

ENDIF 


USE, LET, SELECT, DROP and DELETE in Matrix 


SYSTAT 11 allowed a mixture of string and numeric columns in matrices. SY STAT 
12 provides a facility to read either numeric or string columns of a data file as a 
matrix by introducing a new option MTYPE with the USE command. By default, 
MTYPE is NUMERIC. 


USE filename / MAT=matname MTYPE=NUMERIC or STRING 
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For example, 

USE SICKDATE / MAT = x 

SHOW x 

USE SICKDATE / MAT = y MTYPE = STRING 
SHOW y 


The LET and SELECT commands of Matrix have been replaced by MLET and 
MSELECT respectively. 


To delete rows and columns of a matrix, you can use MDELETE instead of DELETE 
and DROP that were used in SYSTAT 11. 


MDELETE ROWS=rownumrlist or rownamelist / MAT=matname 
MDELETE COLUMNS=colnumlist or colnamelist/ MAT=matname 


For example, 


MAT x 2, BAe Bye CiT 8a 9] 

MAT y > Diy 443 23, 56, 7) 
ROWNAMES x = casel, case2, case3 
COLNAMES x = varl, var2, var3 

MDELETE ROWS = case2, case3 / MAT = x 
MDELETE ROWS = 1 / MAT=y 

SHOW x, Y 

MDELETE COLUMNS = varl / MAT= x 
MDELETE COLUMNS = 1, 2 

SHOW x, y 


nou 


If you do not specify MAT=matname in the command, then SYSTAT assumes the 
active matrix by default. 


94. CLEAR (for Temporary Variables, Arrays and Matrices) 


The CLEAR command was earlier used for clearing matrices from memory. That 
functionality is now provided by: 


CLEAR MATRIX = matnamelist 


For example, 


SHOW a, b 
CLEAR MATRIX = a, b 
SHOW a, b 


The old syntax CLEAR matnamelist will also continue to work. 
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The CLEAR command can also be used to clear temporary variables from memory. 


CLEAR VARIABLES = tmpvarlist 


For example, 


a=5 
b= 6 
PRINT a, b 


CLEAR VARIABLES = a, b 


The following form of the CLEAR command is use to clear arrays from memory: 


CLEAR ARRAYS = arraylist 


For example, 


USE rainfall 
ARRAY p / xX, 
FOR i = 1 to 
PRINT p(i) 
NEXT 

CLEAR ARRAYS = p 


NK 


95. CYAN and MAGENTA Color Option Values 


SYSTAT 11 allowed cyan and magenta colors to be set for graphs only by using the 
color codes 11 and 12 respectively. You can now use CYAN and MAGENTA as 
option values in such options as COLOR, ACOLOR and FCOLOR. 


96. Common Graph Options 


The common graph options will only work for a given graph command if it is 
meaningful for that command. Unlike earlier versions where irrelevant options 
were ignored while drawing the graph, now an error message (in some cases, a 
warning message) to that effect will be displayed. 

97. STATS Commands 
You can use CSTATISTICS and RSTATISTICS instead of CBSTAT and RBSTAT 
respectively. 
To compute column statistics and stem-and-leaf-plots for a subset of the cases, use 
the following command syntax: 


CSTATISTICS varlist / ROWS = rownum 
CLSTEM varlist / ROWS = rownum 
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To compute row statistics and stem-and-leaf-plots for a subset of the variables, use 
the following command syntax: 


RSTATISTICS rowlist / COLUMNS = varlist 
RWSTEM rowlist / COLUMNS = varlist 


98. TESTING Commands 


The commands of the TESTING module have been streamlined so that the 
separator in the two-sample cases is always the equal sign, and redundant options 
have been removed. The old and new command syntaxes are listed below: 


Old Syntax New Syntax 

PROP n1 m1 * n2 m2 PROP nl ml = n2 m2 

PROP varl var2 var3 var4 PROP varl var2 = var3 var4 
TCORR varlist / ZERO TCORR varlist 


TCORR varl var2 var3 var4 / TWOTCORR varl var2 = var3 var4 


VARI varlist * grpvar / SEVERALVARI varlist * grpvar 
(Two or several will be done depending on whether 
grpvar takes two or more values respectively) 


99. MODEL Command in PROBIT 


The PROBIT module no longer requires that the PROBIT option of MODEL be given 
to perform probit analysis. PROBIT with the MODEL command will do probit 
analysis, and CMGLH or LOGIT with MODEL will do logistic regression. 


100.QCREGRESS and RUNCHART (in QC) 


The REGRESS command in QC, for drawing regression control charts, has been 
substituted by QCREGRESS so as not to conflict with the REGRESS module. The 
RUN command has been substituted by RUNCHART. 


101.Alternative and Replacement Commands 


The following command replacements have been made: 

m RIDGEREG to replace RIDGE 

m TAUB to replace TAU in CORR 

m ADJDIFF to replace DIFFERENCE option of CONTRAST in GLM 

m DUNNETT =LT or GT in place of DUNNETT = ONE for POST in GLM 
m RESET to replace CLEAR in MIX 

m INVGAUSSIAN to replace INVEGAUSSIAN in PCA of QC 
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m CHISQUARE to replace CHI_SQUARE in ESTIMATE of SIGNAL 
102.TABLES and GRAPHS Options for OUTPUT 
The TABLES and GRAPHS options of the OUTPUT command will no longer work. 


Tables and graphs will be left-aligned by default, and you can use the Center or 
Right tools in the Format Bar to change the alignment subsequently. 


103.Deleted Commands 
The following commands are no longer supported in SYSTAT 12. 
m DUSE, DECIMAL, LOADLABEL, LOGSAVE 
m LCOLOR and THREEDFONT 
m COMPATIBLE, ERASE, GOTO, HOLD, RUN and STOP in BASIC 
m SET in MATRIX 
m GENERATE in RANDSAMP 

104.Alternative Notations of Relational/Logical Operators 


Alternative notations of relational/logical operators, like LT, GT, ><, =<, =>, &, 
&&, and | are no longer supported. 


Output 


New features 
105.HTML Based Output 
The output is now HTML based with tabular and text modes. Tables can be directly 
and conveniently selected, and they can be copied to external applications without 
distortion. There are collapsible links for parts of the output that can be selectively 
expanded or collapsed with a mouse-click. HTML editing is possible; simply 
select View Source in the context menu. 


The classic output format has also been implemented, and it has been consistently 
implemented across features. The tabular format consistently applies the font 
specified under Proportional output; the classic format consistently applies the font 
specified under Monospaced output. 


You can use mouse-scroll to see the desired parts of output. 


44 
Chapter 1 


106.Organizing Output by Data Files 


By default, the output generates a new data node every time a data file is opened. 
That also implies that the output is arranged chronologically. You can set it to be 

grouped by data nodes so that all the graphs and analyses pertaining to a given data 
file are placed together. You will have to restart SYSTAT for this option to come 
into effect. 


107.Enhanced Print Preview 


You now have more options while previewing output before printing. You can 
switch between portrait and landscape modes, specify the page setup including 
headers, footers and margins, turn headers and footers on or off, view full width, 
full page, various page views, preview just a selection of the output, and 
shrink/enlarge the content. 


108.Setting Very Small Font Sizes 


You can now specify very small font sizes even below 8 points. This may help in 
certain cases where the usual font sizes would not fit a very wide table in a printed 
page. To specify the fonts, from the menus choose: 


Edit 
Options... 


Click the Output tab, enter a font size under Proportional output and/or 
Monospaced output. 


109.Wrapping and/or Truncating Text in Tables 


The text written in tables can be sometimes very long, especially when variable 
and/or value labels are defined. In such cases, in each cell, by default the text will 
be wrapped into multiple lines if they extend beyond 15 characters. Row headers 
will be wrapped if they extend beyond thrice this number, i.e., 45 characters. You 
can set a different number here as desired. You can even uncheck this option to 
prevent wrapping. 


Apart from wrapping, the text in tables can also be truncated. By default, in each 
cell, the truncation will happen at 45 characters. You can change this number or 
even turn off truncation. 
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110.Inserting Images, Block Format, Indent and Outdent 
You can now insert images anywhere in the output by clicking the desired location 
of the image, and using the Insert Image tool on the Format Bar. Or, from the menus 
choose: 
Edit 


Format 
Insert Image... 


In the Picture dialog that pops up, specify the path to the image file, the alternate 
text, the layout and spacing. 

You can also specify the format for the current line or selection of output using the 
Block Format tool on the Format Bar. There are also tools to Indent or Outdent the 
current line or selection of output. 


111.Customizing the Output Scheme 


You can now customize the output format in terms of the font color, style (regular 
or bold) and background color of various components of the output (excluding 
graphs), as well as the page background. To do this, from the menus choose: 
Edit 

Options... 
Click the Output Scheme tab. The output items you can change are: 


Echoed commands, text, table body and caption, header, sub-header, footer, error 
and warning messages, and page background. 


If you create an interface theme with these settings, these are also saved to the 
theme file. 

112.Setting Detailed Output Organizer Node Captions 
You can customize the captioning of text nodes in the Output Organizer. By default, 


the caption is the title of the analysis that the node pertains to. The associated 
command appears as a tooltip on mouse hover. You can set the tooltips themselves 


to appear as node captions. 

For a given analysis, the associated command is the most significant command 
related to that analysis; typically the HOT command. For example, for least squares 
regression, the default node caption is 'OLS Regression’ whereas the detailed node 
caption is the MODEL command line. 
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Modified features 


113.Page Width 


To control the page width, SYSTAT 11 offered two options: Narrow and Wide. 
Narrow would display 80 characters, and Wide would display 132 characters in a 
line; the default width was Narrow. In SYSTAT 12, the number of characters is 
dynamically set based on the current font size and output format (tabular or 
classic). That means, if you select Narrow anda font size of 10, about 77 characters 
will appear in the tabular format, and about 82 characters in the classic format. The 
numbers are 106 and 113 for Wide. 


One implication of the page width is that very wide tables will get split into two or 
more parts (appearing one below the other), even with the Wide setting. SYSTAT 
now allows you to set an infinite width for the page, that is, the None option. 
Selecting None prevents tables from splitting no matter how wide they are. 


114.Saved Output 


When a SYSTAT output file (.SYO) is saved, the data files are linked to the output 
file. That means, you can open an output file saved in a previous session, and 
continue working with it provided the underlying data files exist in the same path. 
You can disable this option if you do not want to use output files across sessions. 
Simply uncheck Link data files to output file in the General tab of the Edit: Options 
dialog. 


When a SYSTAT output file is saved, the command log will also be saved with it. 
That means, you can open an output file saved in a previous session, and re-use the 
commands from that session. Uncheck Save command log in output file in the 
General tab of the Edit: Options dialog, if you do not use output files across 
sessions. 


115.Saving Text Output to File 


You can save text output to a file using the OUTPUT command as before, provided 
you set the output to appear in the classic format. 
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Deleted Features 

116.Copying and Pasting Output Through Output Organizer 
You can no longer copy/paste output using the Output Organizer. However, the 
drag-and-drop facility is available. 

117.Saving as Framed HTML Page 
While saving in the HTML format, prior versions of SYSTAT saved the output as 
a framed HTML page with a frame on the left hand side similar to the Output 
Organizer nodes. This is no longer possible. 

118.Viewing Graphs as Frames Only 


Prior versions of SYSTAT had the option of viewing graphs in the output as tiny 
icons instead of full images. This facility is no longer available. 


Help 


New features 
119.Help Through the Commandspace 


You can type a word in any tab of the Commandspace, and either click on it and 
press Ctrl+F1, or right-click on it and select HELP command in the context menu. 


120.Bubble Help 
Apart from the help provided on the status bar about each menu item, a more 
detailed description is provided in a "bubble" that appears when you pause the 
mouse on the menu item for a few seconds. You can specify the number of seconds 
to pause the mouse before the help appears, or even turn off the help completely. 
To do this, from the menus choose: 
Edit 

Options... 

In the General tab, enter the desired Time delay in seconds. Uncheck Display 
Bubble Help if you do not want the help to appear at all. 


121.Data File References 


SYSTAT comes with a folder of data files, which can be accessed through the File: 
Open: Data dialog. The folder contains over 350 files of data used in the nearly 600 
examples provided in the user manual and online help. The online help system 
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gives details of these files, with sources of data, a brief description of the study 
which generated the data, and a description of the variables in the file. The details 
are available in the Data File References book. Appropriate links are inserted 
wherever these files are referenced, so that a simple click of the mouse will provide 
you information about the file. 

122.Acronym Expansions 
A list of acronym expansions is provided in the online help system. To access these, 
from the menus choose: 
Help 

Acronym Expansions... 

123.Tutorial 
SYSTAT provides you a tutorial that walks you through the basics of SYSTAT with 
a number of examples. To access this, from the menus choose: 

Help 
Tutorial... 

124.FAQ 
A list of Frequently Asked Questions (FAQ) with answers to them is provided in 
the online help system. To access this, from the menus choose: 
Help 

FAQ... 
Modified Features 


125.Discussion and Glossary Pages 
In SYSTAT 11, the discussion and glossary pages in the online help appeared in the 
right frame of the help window. They now appear in a pop up window. 
Graphics 
New features 
126.Interactivity for BEGIN..END Graphs and Quick Graphs 
Graph Interactivity is now available even for Quick Graphs and BEGIN..END 
graphs. All interactivity features, barring a few, are provided for these graphs. 
127.Hexagonal Binning (Manual and Automatic) 


SYSTAT 12 provides the hexagonal binning option for two-dimensional 
Scatterplots, Probability plots, and Quantile plots. Hexagonal binning creates a sort 
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of bivariate histogram for large data sets where the X-Y plane is tessellated by a 
regular mesh of hexagons. The radius of the hexagons is the same throughout but 
its color depends on the range of the frequency of the x-y values. You can set 
SYSTAT to do automatic hexagonal binning. From the menus, choose: 
Edit 

Options... 
Click the Graph tab, and enter a Threshold value; if this number is greater than the 
number of cases in your file, hexagonal binning will not be done. 


While doing automatic hexagonal binning, the plotting plane is divided using 25 
grid cuts along the X as well as Y-axis. You can specify any number in the range 
(2, 50]. For example, if you enter 7, then the entire graph frame is split into a 7 x 7 
square grid (grid lines are not displayed), and each cell frequency is represented by 
a colored hexagon (the graph legend shows the color coding). 

128.GRADIENT and WIREFRAME Options 


SYSTAT 11 allowed you to set the gradient and wireframe for surfaces through its 
interactivity features only. In SYSTAT 12, these options are also available at the 
time of drawing the surfaces. Use the GRADIENT and WIREFRAME options with the 
DENSITY, PLOT, PPLOT, and FPLOT commands. 

129.Turnig off Frame Titles 
You can now turn off frame titles in graphs having multiple frames. Use the FTITLE 
= OFF option with any graph command except MAP, FPLOT, DRAW and WRITE. 
This feature is particularly useful when you are drawing graphs using 
BEGIN...END. 

130.Base Line for Anchor Bar 
SYSTAT 11 allowed you to anchor bars at any given base, but no base line was 
drawn. SYSTAT 12 draws a base line at the specified base value.. 


131.Object Tracker 
As part of the graph interactivity feature, an object tracker is drawn around the 
object that is currently in focus. This makes it easy for you to identify individual 
objects in the graph being edited. 

132.Repositioning the Graph Title 
The graph title can now be repositioned using the mouse. Simply click and hold the 
mouse and drag it to the desired location. 


133.Specifying Canvas Background and Borders 


Canvas background is a rectangular space that surrounds all the graphs that appear 
ina single display. You can fill the canvas with a different color, and apply different 
color schemes. By choosing an option from the Color scheme dropdown list, you 
can apply various designs to the canvas. When None is selected in Color scheme, 
you can select a color through the Color dialog by clicking the Fill color button. 
(With other choices in Color scheme, fill color is ignored.) You can also change the 
style, width, and color of the canvas boundary. Click the Color button, choose a 
color from the Color dialog that pops up, or define a custom color. Select a line 
style from the Style dropdown list. Select or enter a value between | and 10 in the 
Width dropdown list. 


134.Global Graph Options Through Dialog 


The following global settings for graphs can now be set through the Graph tab of 
the Edit: Options dialog: 


Origin, Eye, Facet and Depth 
These were available in prior versions of SYSTAT only through commands 
(ORIGIN, EYE, FACET and DEPTH). 


You can also specify the default background color to use in all graphs through the 
Edit: Options dialog. In prior versions of SYSTAT, the background was transparent 
by default, and you could change it for particular graphs using the FCOLOR option. 
You can continue to use the FCOLOR option as before. 


135.Setting the Image Type of Graphs in the Output 


You can the set the image type of graphs that appear in the Output Editor. To do 
this, from the menus, choose: 
Edit 

Options... 
Click the Output tab, and select the desired Image format. You can choose from 
PNG, BMP, GIF, JPG and EMF image types. 
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Modified Features 

136.Enhanced Interactive Graphics with Single Dialog Interface 

m A single dialog box with several options for graph interactivity, has been 
provided. 
m Interactivity at various levels: Graph, Frame, Axis, Legend, Element. 
m The dialog is context sensitive in the sense that, the options relevant to the graph 
object that you click on will be immediately available. 

m You can still access the other options by clicking the desired tab. 

137.Percentage Bar Charts 
SYSTAT 11 drew the percentage bar chart by stacking bars in a single bar. 
SYSTAT 12 draws percentage bar charts with separate bars where the height of 
each bar is a percentage of the total. If you request the Stack option with the 
Percentage option, then you will get the same chart as SYSTAT 11's percentage bar 
chart. 

138.Range for ETHICK 
The ETHICK option of BAR, DOT, LINE, PROFILE and PYRAMID, for specifying the 
thickness of error bars, allowed any value as thickness even if it widened the error 
bars beyond the width of the bar. The option value has now been restricted to be 
less than or equal to 1, i.e., the valid range is (0, 1]. 

139.Stacked Bar Chart 
If you request the Stack option for a univariate bar chart, SYSTAT 12 will stack the 
bars one above the other. 

140.Enhanced Highlight Point Tool 
The Highlight Point tool in the Graph Editing toolbar of SYSTAT 12 automatically 
shifts the focus to the Data Editor while showing the case corresponding to the 
highlighted point. 

141.Enhanced Graph Tooltips 
Graph tooltips in SYSTAT 12 show more information in the tooltip (that appears in 
the status bar) than SYSTAT 11. For example, the case number is shown for 
individual plot points in a scatterplot, and the count corresponding to each bar is 
shown for individual bars in a bar chart. 


142.More Drawing Attributes 
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You can specify more drawing attributes for annotations than you could in 
SYSTAT 11. In addition to the existing options, you can also specify the line style, 
fill style and fill color to be used in the text and graph annotations. 


143.More Text Tool Font Options 
SYSTAT 11 had the standard font dialog options for text annotations. SYSTAT 12 
allows you to set the font rotation, font background color, and case as well. 
144.Enhanced Page View 


The Page View for graphs now gives you the ability to stretch/shrink graphs as 
well. 


Deleted Features 
145.Some Graph Saving Options 


Not all the graph saving options that were in SYSTAT 11 may be available in 
SYSTAT 12. 

146.Importing Map Files with User Specified Names 
You can import map files as before with the TYPE = MAP option of the IMPORT 


command. However, the names will be the same name as the .dat files that you are 


importing. You can no longer specify a SAVE command before the IMPORT 
command to set custom names. 


147.Some Dynamic Explorer Options 
You can no longer use the Dynamic Explorer to do a power transformation, change 


the confidence level for confidence kernels and ellipsis in plots, the tension for plot 
smoothers, or the number of bars 


148.Ruler in Page View 


The Ruler no longer appears in the Page View for graphs. 
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Statistical Features 


New features 
1. AIC and Schwarz's BIC 


SYSTAT computes: 
m SYSTAT computes: 
m Akaike Information Criterion, 
m its corrected version, and 
m Schwarz's Bayesian Information Criterion 
as a part of the output for least squares regression, ANOVA, GLM, MANOVA, 
logistic regression, probit analysis, mixed models and survival analysis. 
2. Quade, Anderson-Darling (Nonparametric) Tests 


m The Quade test measure differences among related samples. It is a 
nonparametric two-way ANOVA to compare related groups; an alternative to 


Friedman test. 
m The Anderson-Darling test examines the distribution ofa single variable. It is a 
standard goodness-of-fit test. 
m The Friedman test now allows you to specify grouping and blocking variables. 
m Statistic and p-value obtained from different nonparametric tests are saved to 
SYSTAT data file 
3. Multinormal tests 
m Measures of multivariate skewness and kurtosis are computed. 
m Tests for multivariate normality are carried out using the measures. 
m Henze-Zirkler test is also computed. 
m Mahalanobis distances can be saved. 
m The Quick Graph using scaled squared Mahalanobis distances is drawn. 
4. LAD, M, LTS and S (Robust) Regression 
Two new robust regression techniques: 
m Least Trimmed Squares (LTS), 
m Scaled (S), 
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have been added to the regression suite. 


LMS, LAD and M regression features have been modified to be consistent with 
these new techniques. 


. Partial Least-Squares Regression 


The Partial Least Squares (PLS) technique is one of the methods for constructing 
regression equations. It can be looked upon as an extension of the multiple linear 
regression technique. PLS has recently gained importance in many areas of 
application such as Chemometry and Economics, in situations where the number 
of variables is large relative to the number of cases and there is likely to be 
multicollinearity among the predictor variables. 


The PLS method extracts some latent factors from the data and then fits a 
regression equation between them. 


SYSTAT offers the Partial Least Squares regression technique by using two most 
popular algorithms, viz., Nonlinear Iterative Partial Least Squares (NIPALS) 
algorithm and the Straight-forward Implementation of Partial Least Squares 
(SIMPLS) algorithm. The standard errors of the estimated regression coefficients 
are calculated by the Jackknife procedure. The user is given the option of two types 
of cross-validation procedures, viz., leave-one-out and random exclusion, to 
validate the fitted regression model. SYSTAT provides score plot as quick graphs. 
Further, different types of output can be saved to a SYSTAT data file for further 
analysis. 


As different methods of cross-validation techniques are available, resampling 
techniques are not offered in SYSTAT under the PLS Regression feature. 


. Mixed Models 


m This feature provides analysis of a variety of linear mixed models, providing 
different types of estimates of fixed effects parameters and variance 
components, with standard errors, confidence intervals and tests of hypotheses. 
Many types of data such as repeated measures, growth curves and longitudinal 
data can be analyzed with this feature. 


SYSTAT provides following linear mixed models: 
m Variance Components models (VC) 
m Hierarchical Mixed models (MIXED) 


With Variance components (VC) feature, you can 
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a Estimate a variance components model for both balanced and unbalanced data 
using one of the options: three types of analysis of variance (ANOVA) methods, 
Minimum 

Variance Quadratic Unbiased Estimation (MIVQUE (0)), maximum likelihood 
Method and Restricted Maximum Likelihood (REML) method 
m Test various hypothesis 


m Use any number of fixed and/or random effects, including interactions (crossed 
effects) and nesting (nested effects) 

m Use categorical as well as continuous predictor variables 

The models handled by VC constitute a subclass of those handled by MIXED, 

which allows more general covariance structures for the random effects and the 

random error. The subclass of models dealt with by VC is arguably the most 

frequently used type of linear mixed models. 


Hierarchical Linear Mixed Models in SYSTAT fits and analyzes mixed models 
with structured covariance/correlation matrices for random effects and residuals. 
You can use four types of covariance structures for random effects: 


m Variance component 

m Compound Symmetry 

m Diagonal 

m Unstructured 

and three types of covariance structures for error components: 
m Variance Components 

= Compound Symmetric 

m Auto-Regressive Moving Average (AR(1)) 

You can use the following options for estimating covariance parameters: 
m Maximum Likelihood 

m Restricted (Residual) Maximum likelihood 


In addition to the standard output such as covariance parameter estimates, standard 
errors, confidence intervals and Wald's test for testing significance of the effects, 


SYSTAT offers 
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m Fixed effect and random effect solutions and ANOVA type tests for fixed 
effects t-test for testing the whether the coefficient is zero 

m Log-likelihood, Akaike Information Criterion (AIC) , Akaike Information 
Criterion Corrected (AIC (corrected)) and Bayesian Information Criterion 
(BIC) 

m Save option to save residuals, parameter estimates and BLUPS among others 

Response Surface Methods 


Response surface methodology (RSM) is used to develop an empirical model, 
commonly called response surface, for the response of a process in terms of the 
relevant controllable factors. RSM also determines the operating conditions that 
produce the optimum response. 


You can produce good response surface designs like the central composite (CC) 
design or the Box-Behnken (BB) design. SYSTAT allows you to specify and fit a 
model up to second order. It also provides the option to include blocks. RSM fits a 
model for each response separately when there is more than one response. RSM 
gives the ANOVA table and the 'Lack of Fit' test for each response model 
separately. Contour plots of each response for all pairs of factors are produced. 


SYSTAT also provides two kinds of optimization techniques: canonical analysis 
and desirability analysis. Canonical analysis determines the stationary point, and 
its nature (maximum or minimum or saddle point); it also provides optimal 
response. The desirability analysis optimizes more than one response variable 
simultaneously. SYSTAT gives desirability plots as quick graphs. SYSTAT also 
provides ridge analysis to help find the direction of the optimum when the above 
techniques have led to a saddle point. 


. Trend Analysis (Time Series) 


SERIES provides nonparametric techniques for detecting and estimating trends in 
aseries. Trend analysis is to determine the significance of a trend in observed series 
and to estimate the magnitude of that trend. 

The following tests are provided in Trend Analysis: 

For non-seasonal data: 

m Mann-Kendall test: This is a nonparametric rank based trend test. 


m Sen's Slope estimator: This is a nonparametric, linear slope estimator that works 
most effectively on monotonic data. This is a measure of trend magnitude. It is 
not greatly affected by gross data errors, outliers, or missing data. 
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m For seasonal data: 


m Seasonal Kendall test. This test is a nonparametric test utilizing the Mann- 
Kendall test where the data set is adjusted for seasonality (for example, four 
quarters of the year). It is not greatly affected by missing data, but requires a 
minimum number of data points for each seasons. 


m Modified Seasonal Kendall test. We use this test when the seasons are 
correlated. 


m Homogeneity test. If the trend is upward in one season and downward in 
another, the Seasonal Kendall test and slope estimators are not meaningful. This 
test provides a single statistic that indicates whether seasons are behaving in a 
similar fashion with each other or not. If the hypothesis of homogeneous 
seasonal trends over time is rejected, it is advisable to compute the Mann- 
Kendall statistic and slope estimator for each individual season. 


Modified features 
9. Reampling 


SYSTAT computes bootstrap estimates and confidence intervals in a number of 
features like Descriptive Statistics, CORR and REGRESS. Under bootstrap, you 
also get a confidence interval for the parameter concerned using two popular 
methods, viz., the Percentile method and the Bias corrected and accelerated 
method. 


m Descriptive Statistics. SYSTAT gives a summary based on resampling for 
mean, median, variance, standard deviation, skewness, and kurtosis. You get 
resampling-based estimates along with their bias and standard error. 


m Correlations. SYSTAT gives a summarization based on resampling for Pearson, 
Spearman, Gamma, Tau b, and MU2 correlation coefficients. You get 
resampling-based estimates along with their bias and standard error. 


m Least-Squares Regression. SYSTAT gives a summary based on resampling for 
Linear Regression. You get resampling-based estimates of the regression 
coefficients along with their bias and standard error. 


The summary based resampling output in these modules can be obtained by giving 
SAMPLE prior to the hot command of these modules. 
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10. New Probability Distributions 


The following new probability distributions are added in Probability Calculator, 
Fitting, Kolmogorov-Smirnoy tests, Function Plots, Anderson-Darling Tests, and 
Probability Plots: 


m Benford's Law 

m Logarithmic Series 

m Log-Logistic 

m Erlang 

m Non-Central t 

m Non-Central Chi-square 
m Non-Central F 

m Studentized Maximum Modulus 
m Generalized Lambda 

m Half-Normal 

m Smallest Extreme Value 


11. Basic Statistics (Column and Row) 


m Geometric mean, Harmonic mean, and Trimmed mean are available for column 
and row statistics. 


m Along with Shapiro-Wilk test, you can also perform Anderson-Darling test for 
normality. 


m Mardia's skewness and kurtosis coefficients tests, and Henze-Zirkler test are 
available for checking normality for the multivariate data. 
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Note: 


1 


N 


13. 


14. 


15. 


The STATS module is now globally available in SYSTAT. That is, you can 
directly run the commands related to this module at any time. 

You can also use CSTATISTICS and RSTATISTICS commands instead of CBSTAT 
and RBSTAT respectively. 

The earlier command syntax to request column statistics for a subset of the rows 
has been replaced. Use the option ROWS = rowlist to specify the rows for which 
column statistics are required. Similarly, use COLUMNS = varlist to specify the 
variables for which row statistics are required. 


. Crosstabulation 


m Standardized tables and association measures based on these can be computed. 

m Tables as well as measures from can be saved. 

m Table of counts and percents can be produced. 

m Cell statistics can be computed through the dialog interface. 

Smart Correspondence Analysis 

You can now use a two-way frequency table format for the data file while 

performing Smart Correspondence Analysis. The first column should indicate row 

labels. 

Loglinear Models 

You can now use the Loglinear Models: Estimate dialog box to build your model. 

You can either specify the individual terms of your model using the various buttons 

provided, or you can type in the right hand side of the model. 

Association Measures 

Under the Simple Correlations feature, the following new measures are available: 

m Rank ordered measures: Stuart's Tau, C; and 

m Unordered measures: Phi, Cramers V, Contingency, Lambda and Uncertainty 
coefficients; 

m Binary measures: Anderberg (S7), Yule's Q, Hamman, Dice, Sneath, Ochiai, 
Kulczynski and Gower2. 

Scatterplot matrices are drawn for up to 20 variables and unlimited number of 

cases. 
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16. Least Squares Regression 


17. 


18. 


19. 


m To check the normality of residuals, SYSTAT provides Kolmogorov-Smirnov 
(Lilliefor's ) test, Shapiro-Wilk test, and Anderson-Darling test. 

m For each variable in the fitted model, the output now includes variance inflation 
factor (VIF), which can be used as a multicollinearity diagnostic. 

m SYSTAT now predicts the dependent variable (based on the last estimated 
model) from the set of new observations of independent variables with the 
specified confidence and prediction interval. 

m When the model contains a single predictor, a plot of the fitted regression line 
with confidence and prediction intervals is displayed. In case of two predictors, 
a 3-D fitted plane is displayed. 

Logistic Regression 

m SYSTAT provides confidence intervals for parameter estimates and odd's ratio, 
for the user-specified confidence level, in binary, multinomial, and conditional 
logit models 

m It also gives Cox and Snell and Naglekerke's along with McFadden's Rho- 
square statistics. 

m Receiver Operating Characteristic Curve is displayed as the Quick Graph for 
binary logistic models only. The area under the ROC curve is also reported. 

m Inthe case of binary logistic models, you can specify the cutoff point to display 
the classification table. The default value for cutoff point is 0.5. 

m The Quasi Maximum Likelihood (QML) option has been renamed to Robust 
Standard Error (RSE). 


Two Stage Least Squares 


TSLS computes regressions with polynomial distributed lag structure in the errors. 
The polynomial distributed lag is a method of including a large number of lagged 
variables in a model by reducing the number of coefficients to be estimated, by 
requiring the coefficients to lie on a smooth polynomial in the lagged variables. A 
PDL (polynomial distributed lag) variable specification may be used as an 
independent (predictor) variable in any linear regression procedure. 


After estimating your model, you can perform post hoc analysis using the 
CONSTRAIN command. 


REGRESS, ANOVA, GLM - Assumption Checking 
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SYSTAT offers three tests for checking normality assumptions in Least-Squares 

Regression, ANOVA and General Linear Models: 

m Kolmogorov-Smirnov (Lilliefors). It is a nonparametric test used for large 
samples. It is applied to continuous distributions and gives greater importance 
to the observations in the center than those at the tails. 


m Shapiro-Wilk. The test provides Shapiro-Wilk test statistic and p-value for the 
residuals: the smaller the p-value, the worse is the fit. 
m Anderson-Darling. Anderson-Darling test is a standard goodness of fit test. It 
gives greater importance to the observations in the tails than those at the center. 
SYSTAT now provides Levene's test to check the homogenity of variance across 
groups. 
20. ANOVA, GLM, MANOVA - Sums of Squares 
SYSTAT used the Type III sum of squares until now. With this being the default, 
SYSTAT now provides Type I and Type II sum of squares also as options in 
ANOVA, GLM, and MANOVA. 
21. ANOVA, GLM, MANOVA - Contrasts 
Three new contrast coding have been added to augment the existing suite of six 
contrast coding. 
m Helmert. Compares the mean of the each level of the selected factor with the 
mean of the succeeding levels. 
m Reverse Helmert. Compares the mean of the each level of the selected factor 
with the mean of the previous levels. 
m Deviation. The deviation contrast compares the mean of the dependent variable 
for each level of the selected categorical variable (except a reference level) to 
the overall mean (grand mean) of the dependent variable. 


m Simple. The result of the simple contrast includes testing for each level (except 
a reference level) of the selected factor to the Mean of the specified reference 


level. 


Note: The Difference contrast (DIFFERENCE) has been renamed as Adjacent 
m difference (ADJDIFF). 
22. ANOVA, GLM - Pairwise Comparisons 


After fitting the model in ANOVA and GLM, one can find the treatment pairs 
which are significantly different, or form several homogeneous sets of treatments 
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with their respective p-values by using several multiple comparison tests offered 
by SYSTAT under the equal or unequal variance assumption. Ten new multiple 
comparison tests have been added to augment the existing suite of five tests. 


m Equal Variances 

m Sidak 

m Tukey's B 

m Duncan 

m Ryan-Einot-Gabriel-Welsch Q 

m Hochberg's GT2 

m Gabriel 

m Student-Newman-Keuls 
@ Unequal Variances 

m Tamhane's T2 

m Games-Howell 

m Dunnett's T3 


These tests can be simultaneously requested. 
23. Cluster Analysis 
m The following new linkage methods have been added: 
m uniform and k-neighborhood, 
m flexible beta, 
m weighted linkage 
m The following new distance measures have been provided: 
m Absolute 
m Anderberg 
m Jaccard 
m Mahalanobis 
m RT 
@ Russel 
E SS 
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Five indices of cluster validity have been provided. Cluster validity indices are 
statistical criteria by which an appropriate number of clusters can be chosen from 
the hierarchical Tree. Options for cutting (or pruning) and coloring the hierarchical 
tree are also provided. 

In the K-clustering procedure, an additional algorithm for partitioning, viz., K- 
Medians, has been provided. Nine methods for selecting initial seeds for both K- 
Means and K-Medians are also available. 

Facilities for cutting cluster trees based on leaf nodes and tree heights have been 
added. 


Survival Analysis 

SYSTAT provides Nelson- Aalen estimator for cumulative hazard, user specified 
survival quantiles based on Kaplen-Meier probability, user specified confidence 
interval for Kaplen-Meier probability, Nelson- Aalen cumulative hazard, survival 
quantiles and Mean Survival Time. 

Cox-Snell residual plot is now displayed as a Quick Graph in Cox and Parametric 
models. 
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Introducing SYSTAT 


Keith Kroeger 
(revised by Rajashree Kamath) 


SYSTAT provides a powerful statistical and graphical analysis system in a graphical 
environment using descriptive menus and simple dialog boxes. Most tasks can be 
accomplished simply by pointing and clicking the mouse. 

This chapter provides an overview of the windows, menus, dialog boxes, and Online 
Help available in SYSTAT. For information on using SYSTAT's command language, 
see Chapter 5. 


User Interface 


The user interface of SYSTAT is organized into three spaces: 
m Viewspace 

m Workspace 

m= Commandspace 


Each space in turn consists of tabs and allows you to accomplish specific tasks. One 
space, and one tab within it, will always be active. All menu selections and editing 
apply only to this tab. To make a tab active, click it with the mouse, or select its name 
from the View menu. The user interface provides menus for running statistical 
analyses and producing graphs. It also contains toolbars to provide quick access to 
many standard statistical techniques and graphs. 
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Viewspace 


The Viewspace consists of four tabs: 

m Startpage 

m Output Editor (untitled .syo upon opening) 

m Data Editor (untitled .syz upon opening) 

m Graph Editor (graph1, when graph is in the Output Editor). 
Startpage. The Startpage is divided into five panes: 


= Recent Files containing a list of all the recently opened data, command and output 
files; you can reopen these files just by double-clicking on their names. 

m Themes contain a list of menu themes; double-click any one to apply it to the 
SYSTAT window. 


= Manuals containing a list of the user manual documents; you can open the desired 
volume by double-clicking on its name. 
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m Tips providing useful tips about SYSTAT's features and how to achieve any given 
task; clicking Next Tip will allow you to scroll through any number of tips. 

= Scratchpad for writing notes while you are working with SYSTAT. Anything that 
you enter here remains across sessions. 


You can click on the bar at the top of the Startpage to know about the new features in 
the current version of SYSTAT. You can close the Startpage if you do not need it for 
the remainder of a session, or even prevent it from appearing when SYSTAT restarts. 


Output Editor. Graphs and statistical results appear in the Output Editor. Collapsible 
links are created for each analysis or graph that you request. You can thus hide output 
that you do not need to see all the time. Simply click on the link once to collapse the 
corresponding output; click again to expand it. 


You can perform some of the Output Editor-related operations using the Format Bar 
that is in the Toolbar area of the SYSTAT window. For more information about the 
Output Editor, see Chapter 6. 


treated Yo PoE O 1887263007 


Data Editor. The Data Editor displays your data in a row-by-column format. 
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Each row is a case and each column is a variable. You can type new data into an empty 
Data Editor, or you can edit and transform data. 


m To define a variable, right-click on a column and choose Variable Properties. This 
opens the Variable Properties dialog box and allows you to name the variable, 
supply a label for it, select the variable type, indicate whether it is categorical, set 
display options, and specify comments. 

Use the Edit menu to cut, copy, delete, and paste rows, columns, and blocks of data. 


m= Use the Data menu to transform data and select subsets of cases. 


The data file that you create or open for use is called the active data file. You can also 
view any number of data files using the File menu or Output Organizer; a new tab is 
created for each file that you view. Once you view a file, you can make it active using 
the context menu. The currently active file automatically goes into view mode when 
you do so. You can thus have any number of data files in the Data Editor ready for use 
at just a click of the mouse. 


Variable Editor. The Data Editor allows you to edit data values directly in the grid that 
you see by default which is its Data tab. It also allows you to edit the properties of 
variables directly in its Variable tab that we will henceforth refer to as the Variable 
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Editor. The Variable Editor has one row corresponding to each variable, and the row 
includes all the items that are in the Variable Properties dialog. With it, you can: 


= Set any of the properties for any variable with a single click of the mouse. 


= View and set the processing conditions in effect for the current data set, viz. 
information regarding frequency, weight, category and grouping variables defined 
if any, and any case selection conditions. 


You can perform some of the Data Editor-related operations using the Data toolbar that 
is in the Toolbar area of the SYSTAT window. See SYSTAT Data for more information 
about the Data Editor. 


Graph Editor. Double-clicking a graph in the Output Editor or just clicking the Graph 
Editor tab opens the Graph Editor. 


Use the Graph Editor toolbar and menus to edit graphs. You can: 
m Insert annotations and other text. 
m Change font, color, fill, surface and line attributes. 
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Rescale axes. 

Modify plot symbols. 

Customize labels. 

Edit legends. 

Identify individual points in scatterplots. 

Select a subset of cases using the Rectangular or Lasso tool. 
Zoom and rotate graphs. 


Change many other properties of a graph like changing its type, drawing various 
smoothers, specifying gradients for surfaces, connecting and partitioning plot 
points, slicing pie charts, and setting attributes for each individual axis line. 


You can also view any number of graphs using the context menu of the Output 
Organizer. You can perform many of the Graph Editor-related operations using the 
Graph Editing toolbar that is in the Toolbar area of the SYSTAT window. See SYSTAT 
Graphics for more information about the Graph Editor. 


By default, the tabs of the Viewspace are arranged in the following order: 
m Startpage 

Output Editor 

Active Data File 

Graph Editor 

Viewed Graphs 

Viewed Data Files 


When a new tab is opened, it is inserted at the beginning of its group. You can click the 
arrow in the top right corner of the Viewspace and check [Active Tab at the Beginning] 
if you want a new tab to appear as the first tab of the Viewspace. You can bring a tab 
into focus by clicking the arrow and checking the name of the desired tab. If there are 
more tabs than are directly visible in the Viewspace, the tab becomes the first tab in the 
Viewspace or in its group depending on whether [Active Tab at the Beginning] is 
checked or not. This is especially useful when you have a lot of tabs open in the 
Viewspace. 


For most of the tabs, you can close the tab in focus by right-clicking and selecting 
Close or pressing the Close button in the top right corner of the Viewspace. 
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Workspace 


The Workspace consists of three tabs: 
m Output Organizer 

m Examples 

m Dynamic Explorer 


Output Organizer. Use the Output Organizer primarily to navigate through the results 
of your statistical analysis. Selecting a completed procedure from the outline displays 
the corresponding results in the Output Editor. You can also use the Output Organizer 
to select an item, and then copy, paste, delete, or move it, allowing you to tailor 
SYSTAT's output to your preferences. In addition, you can quickly move to specific 
portions of the output without having to use the Output Editor scrollbars. 


For more information about the Output Organizer, see Chapter 6. 


Examples. Use the Examples tab to conveniently execute command scripts given in 
the user manual with just a click of the mouse. The SYSTAT Examples tree is 
organized by folders and nodes, the folders corresponding to each volume of the user 
manual. Double-click the nodes to run the underlying commands. You can also open 
these command scripts in the Commandspace for editing, and create links to your own 
command files for easy execution. You can even add example nodes to this tab using 


the Utilities menu. 
See Chapter 5 to know more about the Examples tab. 


Dynamic Explorer. The Dynamic Explorer becomes active when there is a graph in the 
Graph Editor, and the Graph Editor is active. Use the Dynamic Explorer to: 


m Rotate and animate 3-D graphs. 
= Zoom the graph in the direction of any of the axes. 


See SYSTAT Graphics for more information about the Dynamic Explorer. 


Commandspace 


The Commandspace has three tabs: 
m Interactive 
m Batch (Untitled) 
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m Log 


Interactive. Selecting the Interactive tab enables you to enter commands in the 
interactive mode, which issues the command after you press the Enter key. You can 
save the contents of the interactive tab (excluding the > prompts) and then use the file 
to submit a sequence of commands. 


Batch (Untitled). Selecting the Batch (Untitled) tab enables you to work with 
command files in the batch mode. You can open any number of existing command 
fiels, and edit or submit any of these files. You could also type in an entire set of 
commands and then save or submit it, The name that you specify while saving any 
content that you may have typed here replaces the caption Untitled’ on the tab. 


Log. Selecting the Log tab enables you to examine the read-only log of the commands 
that you have run during your session. You can save the command log or even submit 
one or more of the generated commands. 


By default, the tabs of the Commandspace are arranged in the following order: 
= Interactive 

m Log 

= Command Files 


When a new tab is opened, it is inserted at the beginning of its group (Batch). You can 
click the arrow in the bottom right corner of the Commandspace and check [Active Tab 
at the Beginning] if you want a new tab to appear as the first tab of the Commandspace. 
You can bring a tab into focus by clicking the arrow and checking the name of the 
desired tab. If you have opened more than 9 command files, the tab becomes the first 
tab in the Commandspace or in its group depending on whether [Active Tab at the 
Beginning] is checked or not. This is especially useful when you have a lot of tabs open 
in the Commandspace. 


You can close the tab in focus by right-clicking and selecting Close or pressing the 
Close button in the bottom right corner of the Commandspace. You can close all open 
command files by right-clicking in any tab of the Commandspace and selecting Close 
All. 


Reorganizing the User Interface 


The Workspace, Viewspace and Commandspace can be resized if desired. To do so: 
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m Drag the boundaries of the panes (between Viewspace and Workspace, Workspace 
and Commandspace, and Viewspace and Commandspace) in the desired direction. 


You can also reposition the panes. For this: 

m Click the upper boundaries of the panes and drag the resulting outline to the new 
position. As you drag the outline, the border thins to indicate that the item will be 
docked to the main window at that location. To prevent docking, drag the item off 
the main window or hold down the Ctrl key as you drag. Double-clicking the upper 
boundary can undock docked items. Undocking items enlarges the remaining 
panes but can result in a cluttered desktop. 


The tabs of the Viewspace can be tiled so that you can view any two of the tabs 

simultaneously. To do this: 

m Click the Window menu or right-click on the toolbar area and select Tile or Tile 
Vertically. All the panes in the Viewspace get laid out in a tiled fashion. 
Double-click one of the title bars to dock the panes to their default or previously 
docked positions. 

Every toolbar can be repositioned by clicking and dragging the move handle ca. 

Toolbars can also be dragged and docked to the boundary between the Viewspace and 

Workspace. The Format Bar, Data and Graph Editing toolbars can be toggled by right- 

clicking on the Output Editor, Data Editor and Graph Editor tabs respectively and 

selecting Show Toolbar. 

You can also close the Workspace, Commandspace and toolbars so that more space is 

available for viewing the output, data and graphs. To do so: 

m undock them and click |] in the upper right corner, or deselect their entry on the 
View menu. Closed items can be reopened via the View menu or using the 
keyboard. Keyboard short cuts are explained in Chapter 7. 


SYSTAT has a common menu bar for all the panes and tabs. There are menus for 
opening, saving, and printing files, editing output, transforming data, matrix 
manipulation, generating experimental designs and random samples, performing 
statistical analyses, and creating graphs. At any given point of time, those menu items 
that are relevant to the active pane or tab are enabled. The menu can be customized 


using the Customize dialog from the View menu. 
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File. Use the File menu to create or open data, command and output files, import from 
databases, and save the contents of the active pane, all panes or newly created data 
files. The data file formats supported include SYSTAT, Excel, SPSS, SAS, MINITAB, 
S-PLUS, Statistica, Stata, JMP and ASCII files. You can save command files or the 
command log, and submit commands that are in the Commandspace, a command file, 
the Windows clipboard, or from a command file list. You can save output in the 
SYSTAT (.syo), or HTML (.mht) formats. You can also define page and printer 
settings, preview and print the content of the Output Editor or Data Editor, and Graph 
Editor. Graphs can be reviewed using the Page Mode under the View menu. When the 
Graph Editor is active, you can also export and print graphs. You can export graphs in 
a variety of formats including WMF, PS, EPS, BMP, JPEG, GIF, TIFF, PNG, PCT and 
CGM. The File menu can also be used to open recent data, commands, and output files. 


Edit. Use the Edit menu to paste clipboard content to the active pane, define output 
related settings like ID variables, order of display of data values, and display of 
variable as well as value labels, change SYSTAT options including variable display 
order in dialog boxes, the algorithm to be used for random number generation, the 
behavior of the Enter key in the Data Editor, font characteristics for output, data and 
graphs, display of statistical Quick Graphs, inclusion of command syntax in the output, 
and measurement units for graphs, reduction or enlargement of graphs, and file 
locations. 


m Output Editor. In addition to the above options, when the Output Editor is active, 
you can undo/redo a few steps of output, cut, copy, and paste statistical output and 
other text from and into the Output Editor, find and replace text strings, clear text 
and output, change font characteristics (including color and size), create numbered 
and bulleted lists, outdent/indent text, align text, tables and graphs, insert images 
and page breaks into your output, and collapse/expand links created by graphical 
and statistical procedures. 


m Data Editor. When the Data Editor is active, you can also undo/redo up to 32 data 
editing operations, cut, copy and paste data from and into the Data Editor, add 
empty rows in a new or existing data file, insert/delete cases and variables, find a 
specific variable, find/replace occurrences of a string or number in any given 
column, and go to a desired cell. 


Graph Editor. When the Graph Editor is active, you can also copy graphs. 


Output Organizer. When the Output Organizer is active, you can also cut, copy, 
paste and insert tree folders, set the selected data file node as active, rename nodes, 
expand/collapse trees and see detailed node captions. 
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View. Use the View menu to view or hide the Workspace, Commandspace, Startpage, 
processing conditions, toolbars and status bar, make tabs active, and launch a full 
screen view of the Viewspace. This menu also allows you to create and customize 
toolbars, keyboard shortcuts and context menus. When the Output Editor is active, you 
can also view graphs as frames only. When the Graph Editor is active, use the View 
menu to switch between the Graph View and Page View, and turn the display of rulers 
and graph tooltips on and off. 


Data. Use the Data menu to define categorical variables, transform (including recode) 
data values, rank, center or standardize data, trim extreme values, sort cases in the data 
file based on the values of one or more variables, transpose cases (rows) and variables 
(columns), wrap/unwrap or stack variables, merge data files (cases or variables), define 
ID variables and order of display of data values, specify grouping variables that split 
the data file into two or more groups for analysis, select and extract subsets of cases, 
list data in the Output Editor, define case frequencies, and weight data for analysis 
based on the value of a weight variable. When the Data Editor is active, you can also 
define variable properties and value labels, as well as edit data. 


Utilities. Use the Utilities menu to access SYSTAT's MATRIX procedure, perform 
probability calculations, generate random samples from a variety of univariate discrete 
and continuous probability distributions, generate a variety of experimental designs, 
perform power analysis and calculations involving functions available in SYSTAT 
(including probability calculations), retrieve data file information and current SYSTAT 
settings, record macros i.e. command scripts generated by actions of the user and play 
them, create command file lists and customized user menus, access recently invoked 
dialogs, save, apply and download SYSTAT menu themes, as well as add examples to 
the Examples tab. 


Graph. Use the Graph menu to access the Graph Gallery and to create function plots, 
summary charts like pie, bar, line, profile, pyramid and high-low-close, density 
displays like histograms, dot densities and box plots, distribution plots like density 
functions, probability plots and quantile plots, scatterplots, scatterplots matrices, 
parallel coordinate displays, Andrews’s Fourier plots, icon plots and maps. You can 
also overlay various graphs in a single frame. When the Graph Editor is active with a 
graph in it, you can realign any displaced graph frames with their original positions, 
edit various properties of the graph like font attributes of graph/frame titles, axes, tick 
mark, bar and case labels, zoom, rotation, layout (position, size and arrangement), title, 
background color, type (for summary and density charts), and coordinate system of 
graphs, axes/scale type, tick mark style and location, label, limit lines, grid lines, 
transformations, line style and scale ranges on the graph’s axes, titles, labels, location 
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and layout of graph legends, colors and fill patterns for the graph's elements, style and 
size of plot symbols, surface, gradient and wireframe styles, and various options for 
each graph type. The Graph menu also allows you to copy graphs, define text 
annotation font and graph annotation attributes, select the pointer tool or any of the 
annotation tools, select the panning or zooming tools, reset any panning or zooming 
done to a graph, highlight a point in a plot to view the corresponding case in the Data 
Editor, choose the region or lasso selection tools, and show or hide any selection made 
using these tools in the plot. 


Analyze. Use the Analyze menu to run fundamental statistical analyses including 
crosstabulation, column and row basic statistics and stem-and-leaf plots, fitting 
distributions, correspondence analysis, loglinear models, nonparametric and 
multinormal tests, hypothesis testing, simple as well as set and canonical correlations, 
Cronbach’s alpha, linear and robust regression methods, logistic regression, probit 
analysis, two-stage least squares, mixed as well as nonlinear regression methods, 
nonparametric smoothing, univariate and multivariate analysis of variance, general 
linear models, mixed models, discriminant (classical and robust), cluster as well as 
factor analyses, plotting, transforming, and smoothing time series, autocorrelation and 
cross correlation functions, seasonal adjustment, ARIMA, trend analysis, and Fourier 
transformation. 


Advanced. Use the Advanced menu to perform advanced statistical analyses like 
missing value analysis, quality analysis (including Pareto, Box-and-Whisker, various 
control charts like Shewhart and X-MR, ARL and OCC computation, and process 
capability analysis), nonparametric, Cox and parametric survival analysis, response 
surface methods (estimation, optimization and plotting), path analysis, conjoint 
analysis, multidimensional scaling, perceptual mapping, partially ordered scalogram 
analysis, test item analysis, signal detection analysis, spatial statistics, and CART. 


Quick Access. Use the Quick Access menu to quickly access all the commonly used 
statistical procedures. You may want to customize this menu to contain those analyses 
that you frequently use so that you may access all of them in a single location. 


Window. Use the Window menu to cascade, tile horizontally/vertically or arrange the 
tabs of the Viewspace. 


Help. Use the Help menu to access SYSTAT’s online Help system (Contents, Index or 
Search, Acronym Expansions), Frequently Asked Questions (FAQ), and Tutorials on 
various SYSTAT features, update the license for running SYSTAT beyond the 
specified period, check for updates to the current version of SYSTAT, access the 
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SYSTAT website, and display the copyright, version number and license information 
of your copy of SYSTAT. 


Context Menus 


SYSTAT provides several context menus that pop up on right-clicking in various 
components (tabs or nodes in the three spaces) of its interface. The available menus are 
listed below with a brief description of each. 


Startpage. You can specify whether you want the Startpage to show at startup, clear 
recent data, command and output files that are listed in the Recent Files quadrant, 
refresh the content of the Startpage, close it for the rest of the session, and invoke the 
Edit: Options dialog. 


Output Editor. You can cut or copy the selected content in the Output Editor to the 
Windows clipboard, paste content from the clipboard to the Output Editor, copy all the 
content in the Output Editor to the clipboard, view the HTML source, refresh, or 
preview the content for printing, collapse/expand links in the output, show the Format 
Bar, create a new output file, clear all or save the content in the Output Editor, and 
invoke the Edit: Options dialog. 


Data/Variable Editor. You can copy all the content in the Data Editor, set one of the 
data files that are being viewed in the Data Editor as the active data file, switch between 
the Data and Variable Editors, enter or view and edit comments for a data file, preview 
the content in the active tab for printing, show the Data toolbar, create a new data file, 
save the active data file, invoke the Edit: Options dialog, close a data file that is active 
or being viewed, and show the processing conditions in effect (if the Variable Editor is 
active). 


Graph Editor. You can invoke the Graph Properties dialog, animate a 3-D graph, 
realign any graph frames you may have moved from their original positions, copy or 
preview (for printing) the graph in the Graph Editor, show the Graph Editing toolbar, 
save the graph that is in the Graph Editor, invoke the Edit: Options dialog, and close 
the Graph Editor. 


Output Organizer. You can rename tree nodes and folders, expand or collapse the 
entire tree including any tree folders or multilevel nodes, insert tree folders, create a 
new output file, clear all or save the content in the Output Editor, and request detailed 
node captions. When a data node is selected, you can also view the underlying data file 
or set it as the active data file. When a text node is selected, you can also cut or copy it 
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(and the corresponding output in the Output Editor) to the clipboard, paste one or more 
nodes after copying them to the clipboard, or even delete it (which will also delete the 
corresponding content in the Output Editor).When a graph node is selected, you can 
also view the corresponding graph in the Graph Editor. 


Examples. You can run the underlying example command file(s), expand or collapse 
the entire tree including any sub-folders or multilevel nodes. When an example node 
(not folder) is selected, you can also open the underlying command file in the Batch 
tab of the Commandspace. 


Commandspace. Apart from the various options for editing and submitting 
commands, you can right-click on the Batch tab to create a new command file, open 
an existing command file, save the content of the tab, or close the tab. 


In addition to these, context menus are available for cells , columns and rows in the 
Data Editor, command files in the Batch, interactive and log tabs of the 
Commandspace, dialog box elements, status bar and the toolbar area. These menus 
provide shortcuts to various data editing, command submission, dialog actions, status 
bar content and menu actions respectively. 
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Most menu selections in SYSTAT open dialog boxes, which you use to select variables 
and options for analysis. Each dialog box may have several basic components in 


separate tabs. 


Regression: Linear: Least Squares 


Model Estimation} Options | Predict PE ipi |. j 


Available variable{s); 
POP_1990 
GDP_CAP 
LIFEEXPF 
LITERACY 
MCDONALD 


Include constant 
C Save 


{Peels 15 


Dependent: 
[GOPCAP 


Game) |_| 


Independents): 
LIFEEXPF 
LITERACY 


Tabs. Since many SYSTAT commands provide a great deal of flexibility, not all of the 
possible choices can be contained in a single dialog box. The main dialog box usually 
contains the minimum information required to run a command. Additional 
specifications are made in tabs. You can make a tab active by clicking it with the 
mouse. Certain tabs require some input to be given in other tabs before they get 
enabled. A tab may get disabled if its contents are irrelevant for the existing selections. 


Command pushbuttons. Buttons that instruct SYSTAT to perform an action. 


m Runs the procedure for the selections you have made. This does not 


get enabled in some dialog boxes unless the minimum required input is given. 
C Cancels the procedure. Any selections you may have made will be 


discarded. 
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a Displays help related to the dialog box. Ifa dialog box has more than one tab, 
you will get help related to the active tab. 


m Resets the selections in the dialog box or active tab, to the defaults. 
Ê KI Resets the selections for all tabs in the dialog box. 


Source variable list. A list of variables in the working data file. Only variable types 
(numeric and/or string) allowed by the selected command are displayed in the source 
list. 


Target variable list(s). One or more lists, such as dependent and independent variable 
lists, indicating the variables you have chosen for the analysis. If an analysis 
compulsorily requires you to choose variables here, you will see '<Required>' in the 
list. If a list is empty, all variables in the source list will be used for the analysis. 


Special lists. Some dialog boxes display lists with multiple columns, where you can 
input as many rows of input as you desire. Such lists can be customized using the two 
buttons: 


m Insert a new row by pressing the Ù] icon. 
m Delete a row by pressing the xi icon. 


Pushbuttons. Dialog boxes contain pushbuttons for performing the following tasks: 


m Add one or more variables to the desired target list by selecting them and then 
pressing the corresponding Add -> _ button. Alternatively, right-click on a 
variable or selection and select the “Add to target list” corresponding to the desired 
target list. 


m Remove one or more variables from a target list by selecting them and then 
pressing the corresponding <-- Remove | button. Alternatively, right-click on a 
variable or selection and select Remove. 


m 'Cross' a variable in the source list with one in the target list by selecting them and 
then pressing the Coss -> button. You can also add crossed terms of multiple 
variables directly by selecting these variables in the source list and pressing the 
Cross button. 


m Use the _#-> _ button when you want to include the variables as well as all their 
crossed terms. You can also use this button with multiple variables. 


m Use the Nest-> button to include nested terms in the target list. 
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Selecting variables. To add a single variable to the desired target list, you simply 
highlight it in the source variable list and click the Add--> button. Use the <-- Remove 
button to undo your selection. You can also double-click individual variables to move 
them from the source list to the target list, or vice versa. When there is more than one 
target list, this functionality will apply to one of them. 


You can also select multiple variables: 


To highlight multiple variables that are grouped together on the variable list, click 
and drag the mouse cursor over the variables you want. Alternatively, you can click 
the first one and then Shift- click the last one in the group. 

To highlight multiple variables that are not grouped together on the variable list, 
use the Ctrl-click method. Click the first variable, and then Ctrl-click the other 
variables that you want. Avoid the name area while clicking and dragging. 

To select all the variables in a list, click inside the list and press Ctrl + A, or 
right-click and select Select All. 


You can also right-click on a variable or a highlighted set of variables and use the menu 
that pops-up to add them to the desired target list, or remove them from the list. 


Additional Features. Several additional features have been provided for the dialog 
boxes. They are: 


Keyboard shortcuts as an alternative to check boxes and radio buttons. Hold down 
the Alt key and press the underlined letter in the caption. 

The Tab key to navigate between items. 

For an edit text taking numeric values, tooltips indicating the valid range, displayed 
while pausing the mouse on the edit text. 

Edit texts taking integer values not accepting the decimal separator as input. 

Edit texts taking nonnegative values not accepting the negative (-) sign as input. 


Edit texts to contain filenames of files to be opened or saved, for features that 
uire or support such options. Type the desired filename (with path), or press the 


J| button and select a file. 
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Getting Help 


SYSTAT uses the standard HTML Help system to provide information you need to use 
SYSTAT and to understand the results. This section contains a brief description of the 
Help system and the kind of help provided with SYSTAT. 

The best way to find out more about the Help system is to use it. You can ask for 
help in any of these ways: 

m Click the button in a SYSTAT dialog box. This takes you directly to a topic 
describing the use of the dialog box. This is the fastest way to learn how to use a 
dialog box. 

m Right-click on any dialog box item, and select 'What's this?’ to get help on that 
particular item. 

m Hover the mouse on a menu item that would have opened a dialog box and press 
F1 to get help on that particular dialog box. 

Select Contents or Search from the Help menu. 


For help on any term or phrase that is listed in the Help Index, from the command 
prompt (on the Interactive tab of the Commandspace) type: 


HELP “[phrase]” 


The quotes are required only if the phrase contains spaces. This is very useful if you 
need help on SYSTAT commands. Refer the Command Language chapter for details. 


Alternatively, type the term or phrase in any tab of the Commandspace, right-click on 
it and select HELP phrase. You will need to select the whole phrase before you 
right-click if it contains spaces. 


Navigating the Help System 


The SYSTAT Help system has the following tabs: 


= Contents. The Contents button takes you to the table of contents of the Help 
system. Double-click book icons @ inthe Index listing to view the contents of 
that section. Selecting a topic with a page icon opens the associated Help 
topic. 

= Index. Provides a searchable index of Help topics. Enter the first few letters of the 
term you want to find and then double-click the topic in the list (or click and press 
the Display button) to view it. 
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Search. Offers a full-text search of the Help system. Type the desired keyword and 
press the Enter key or the List Topics button. The Help system returns all topics 
containing the specified term. Double-click the desired topic in the list (or click and 
press the Display button) to view it. Check Search previous results to search for 
the keyword from within the previously listed topics. By default, all word forms of 
the keyword are located. Uncheck Match similar words if you want just the exact 
keyword to be located. Check Search titles only if you want to confine the search 
to the page titles alone. 


Favorites. Allows you to create and use a list of favorite help topics. The topic that 
you are currently viewing will automatically appear in the Current topic. You can 
either press Add to add this topic to the list, or you can type in a page title that you 
know exists in the Help system and then press Add. Select a topic in the list and 
press the Display button (or the Enter key) to view the topic. Use the Remove 
button to remove a selected topic from the list. 


The following buttons are available in the toolbar of the Help system: 


Hide/Show. Hides or shows the Contents, Index and Search tabs. 

Back. Returns to the previous Help topic. 

Forward. Moves to the next Help topic, if you had pressed the Back button 
previously. 

Stop. Stops loading a page. 

Refresh. Refreshes the currently loaded page. 

Home. Loads the SYSTAT Help Copyright page. 

Print. Prints the current topic or all sub-topics under the current heading when you 
click this with the Contents tab active. When any other tab is active, use this to 
print the current page. Before printing, the Print dialog pops up so that you can 
specify the desired print settings. 

Options. Enables you to do any of the above, access the Windows Internet Options 
settings, or specify whether you want search keywords to be highlighted in the 
listed pages or not. 


Depending on the topic displayed, the following buttons may appear in the current 
Help page: 


How To. Provides minimum specifications for performing the analysis. 


Syntax. Describes the associated SYSTAT command. SYSTAT's command 
language offers some features not available in the dialog boxes. 


84 


Chapter 2 


= Examples. Offers examples of analyses, including SYSTAT command input and 
resulting output. Copy and paste the example input to the Batch tab of the 
Commandspace to submit the example as is, or modify the commands to your own 
analyses before submitting them. Make sure the file paths match the file locations 


you have opted for. 


= More. Lists analysis options and related tabs. These topics are particularly useful 
for customizing your analyses. 


m See Also. Lists related procedures or graphs. 


You can select, cut, copy, paste and print the content of any Help page. 


Examples 


Often, the best way to learn about a procedure is through examples.The Help system 
provides several examples for each statistical procedure or graph. Select the example 
most relevant to your analysis or browse the examples to explore SYSTAT's 


capabilities. 


‘Single Degree-of Freedon| 
‘Separate Variance Hypot 
Analysis of Covariance 

Repeated Measures 


How does equipment influence typing performance? This 


example uses a one-way design to compare average typing speed 


for three groups of typists. Fourteen beginning typists were 
randomly assigned to three types of machines and given speed 


tests. Following are their typing speeds in words per minute: 


Electric Plain old 
s2 s52 
47 43 
51 47 
49 44 


Word 
processor 


The data are stored in the SYSTAT data file named YPING. The 
average speeds for the typists in the three groups are 50.4, 46.5, 


and 69.8 words per minute, respectively, 
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The examples include all SYSTAT input. You can copy and paste the example input 
(also available as files in the 'Command' folder of the SYSTAT directory and having 
links in the Examples tab of the Workspace) to the Batch tab of the Commandspace to 
submit the example as is, or you can modify the commands to reflect your own 
analyses before submitting them. 

The resulting output, including graphical results, follows the command input. Many 
of the examples include Discussion buttons throughout the output. Pressing any of 
these buttons yields a detailed explanation of the immediately preceding output. There 
may also be examples that are explained in more than one step, in which case More or 
Next buttons will be included in the page. 

Example Command Files. The input commands for each example in the User Manual 
or in the Help system are available as command files in the “Command” folder of the 
SYSTAT directory. This provides an alternative way to run the examples. These files 
are organized in terms of the printed manual. Each file contains commands for one 
example and is named using six characters (xxyyzz.syc). The first two characters 
represent the corresponding volume of the printed manual as follows: 

'da' for Data (called ‘Data Volume' in the Command folder) 

'gs' for Getting Started 

‘gr’ for Graphics 

's1' for Statistics I 

's2' for Statistics II 

's3' for Statistics II 

's4' for Statistics IV 

's5' for Quality Analysis (if installed) 

's6' for Monte Carlo (if installed) 

The next two digits represent the chapter number within the volume, and the last two 
digits represent the example number within the chapter. These files are organized in the 
‘Command folder with nine subfolders, seven of them corresponding to the seven 
volumes mentioned above, a 'GraphDemo' subfolder and a 'Miscellaneous' one which 
contains commands of examples which are not numbered. The names of files in the 
'Miscellaneous' folder are indicative of the examples they relate to. For example, to 
execute the commands given in Example 1 in Chapter 2 of Statistics II, submit the 
's30201.syc' file. (Depending on your file location, you may have to define paths for 
files and rename them appropriately.) 
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Glossary 


The glossary offers an alphabetical listing of terms commonly encountered in 
statistical analyses. The buttons at the top of the glossary scroll the window to the 
corresponding letter. Clicking a glossary entry reveals the definition for that term. 


E? SYSTAT 12 Help 


Glossary 


Click a letter below to scroll the glossary to terms beginning with that 
letter. Click on a glossary term to view the corresponding definition 


DOESHSGELESHH 


Application Gallery 


In addition to examples of each procedure, SYSTAT includes examples drawn from 
several fields of research. Chapter 8 provides a brief introduction to each application. 
You can access the complete applications from the Contents tab of the Help system. 
Double-click the Applications book icon and select Application Gallery. The available 
applications are listed with icons and a brief description. Clicking on any icon will 
open a page containing the detailed description, and buttons for the main Application 
Gallery page, Analyses page, and Sources page. 


Chapter 


SYSTAT Basics 


This chapter provides simple step-by-step instructions for performing basic analysis 
tasks in SYSTAT, including: 


Starting SYSTAT. 

Entering data in the Data Editor. 

Opening and saving data files. 

Using menus and dialog boxes to create charts and run statistical analyses. 
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Starting SYSTAT 
To start SYSTAT for Windows XP, 2000, ME, and NT4: 


m Choose: 


Start 
Programs 
SYSTAT 12 
SYSTAT 12... 
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SYSTAT Basics 


This section discusses how to enter data. If you prefer to start with data stored in a text 
file, see “Reading an ASCII Text File” on p. 93. 


In the frozen-food section of the grocery store, we recorded this information about 


seven dinners: 


Brand$ Calories 
Lean Cuisine 240 
Weight Watchers 220 
Healthy Choice 250 


Stouffer 370 
Gourmet 440 
Tyson 330 
Swanson 300 


Fat 


26 
14 
12 


Viewing, entering and editing data occurs in the Data Editor. To open the Data Editor, 
either choose Data Editor from View menu or click on the Data Editor tab 
(Untitled1.syz) in the Viewspace. 
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SYSTAT - (Untitled? 


Open the Variable Properties dialog box, either from the menu Data->Variable 
Properties or by right-clicking on VAR (first column). 
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Type BRANDS for the variable name. The dollar sign ($) at the end of the variable 
name indicates that the variable contains character information. 


Note: Variable names cannot exceed 256 characters. 


In the Variable label edit box, you can type the alias for the variable name. 
Select String as the Variable type. 

Choose 15 from the drop-down list given in the Characters edit box under the 
Display options group. 

Click OK to complete the variable definition. 


m Repeat this process for the remaining variables, selecting Numeric as the variable 


type. 
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Note: In Numeric display options, the default decimal places are 3. This can be 

changed. Also, it is possible to change the display to Normal, Exponential notation or 

Date and time. 

m Click the top left data cell (under the name of the first variable) and enter the data. 

m To move across rows, press Tab after each entry. To move down columns, press 
the Enter key or down arrow key. 


The Data Editor will look like this: 


220.000 
250.000 
370.000 
440.000 


330.000 
300.000 


m When you have finished entering the data, from the menus choose: 
File 

Save As... 
m Select the location for saving the file. 


m Type SAMPLE as the name for the data file. SYSTAT adds the suffix .SYZ 
(SAMPLE.SYZ). 
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Reading an ASCII Text File 


This section shows you how SYSTAT reads raw (ASCII) data files created in a text 
editor or word processor. SYSTAT can import ASCII files of the type .txt,.dat and .csv. 
SYSTAT can read alphanumeric characters, delimiters (spaces, commas, or tabs 
that separate consecutive values from each other), and carriage returns. SYSTAT 
cannnot read an ASCII file, which contains any unusual ASCII characters or page 
breaks, control characters, column markers, or similar formatting codes. See your word 
processor's documentation to find out how to save data as an ASCII text file. 


Make sure that your text file satisfies the following criteria: 


m Each case begins on a new line (to read ASCII files with two or more lines of data 
per case, use BASIC commands). 


m Missing data are flagged with an appropriate code. 


Imagine that someone used a text editor to enter 10 pieces of information (variables) 
about 28 frozen dinners: 


BRAND$ Short names for brands 

FOOD$ Words to identify each dinner as chicken, pasta, or beef 
CALORIES Calories per serving 

FAT Total fat in grams 


PROTEIN Protein in grams 
VITAMIN A Vitamin A, percentage daily value 
CALCIUM Calcium, percentage daily value 


IRON Iron, percentage daily value 
COST Price per dinner in U.S. dollars 
DIET$ Yes, the dinner was shelved with dinners touted as “diet” or low in 


calories; no, it was shelved with regular dinners 


Table 3-1 

brand$ food$ calories fat protein vitamina calcium iron cost diet 
Ic chicken 270 6 22 6 10 6 2.99 yes 
Ic chicken 240 5 19 30 10 10 2.99 yes 
Ic chicken 240 5 18 4 10 8 2.99 yes 
lc pasta 260 8 15 20 30 2.15 yes 
lc pasta 210 4 9 30 10 8 2.15 yes 
ww chicken 260 4 21 30 4 15 2.79. yes 
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Table 3-1 

brand$ food$ calories fat protein vitamina calcium iron cost diet 
ww pasta 220 4 14 15 8 15 2.79 yes 
ww pasta 220 6 15 6 25 15 2.79 yes 
he chicken 200 2 17 2 2 2.00 yes 
he chicken 280 3 24 15 4 15 2.00 yes 
ww chicken 160 1 13 30 2 2.49 yes 
he pasta 250 3 20 0 8 8 2.00 yes 
ww chicken 190 0 12 10 4 2.49 yes 
st beef 390 24 20 2 4 15 2.99 no 

st beef 370 19 24 2 20 15 2.99 no 

st chicken 320 10 27 10 15 8 2.69 no 

st chicken 330 16 18 2 2 4 2.99 no 

gor beef 290 8 18 15 4 10 1.75 no 

gor pasta 370 16 20 30 40 4 1.99 no 

gor pasta 440 26 20 100 35 10 1.75 no 

gor beef 300 34 22 15 10 20 1:75 nọ 

ty beef 330 14 24 8 10 10 3.00 no 

ty chicken 400 8 27 25 0 10 3.50 no 

ty chicken 340 7 31 70 0 15 3.50 no 

ty chicken 430 24 20 45 4 6 3.00 no 

sw chicken 550 25 22 0 6 15 2.25 no 

sw beef 330 9 25 10 2 25 2.85 no 

sw pasta 300 12 14 0 25 10 1.60 no 


The first line contains names for the columns. SYSTAT will count these names (finding 
10), and read 10 values for each case (dinner). We name this ASCII file FOOD.DAT. 
Let us read the FOOD.DAT file and convert it to a SYSTAT file called FOOD.SYZ. 
m From the menus choose: 
File 

Open 

Data... 

In the Open dialog box, select All Files from the drop-down list of file types, select 
FOOD.DAT and click OK. 


The contents of the data file are displayed in the Data Editor. 
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m From the menus choose: 


File 
Save As... 


m Type FOOD for the filename in the Save dialog box and click OK. 


The subsequent sections will show you how to create charts and run statistical analysis 
using SYSTAT menus and dialog boxes. 


Graphics 


Scatterplots 


Scatterplots provide a visual impression of the relation between two quantitative 
variables. Let us plot CALORIES versus FAT for this larger sample. 


m From the menus choose: 


Graph 
Scatterplot... 
m Inthe Scatterplot dialog box, select FAT as the X-variable and CALORIES as the 


Y-variable. 
m Click the Fill tab in the Scatterplot dialog box and select a solid fill for the first fill 


pattern. 
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W Graph: Scatterplot 


aes Legend | Color | _Fil_|{ Symbol and Label | Surface and 
Main 


Options | Smoother | Residuals | Coordinates | x-Axis | T zans 


Aaa VA X-variable(s} 
BRAND$ FAT z 

FOOD$ 
CALORIES 
FAT 
PROTEIN 
VITAMINA 
CALCIUM 
IRON 
cOST T 
DIETS Matrix column: 
Jisplay a: 

305 


Grouping variables): 


Mirror (Dual) 
( MultPioe 


[C Univariate density display on border | Hisogam 


Overlay multiple graphs into a single frame 


m Click OK to execute the program. 
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600 T T wary 
500+ | 
n gt 
waso œ : 4 
x ee 
O TIAE 
Z300 Sa, x 
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5 
100 a ic 
0 10 +20 3 4 
FAT 


m Return to the Scatterplot dialog box by clicking the Scatterplot tool (Ļ;;.). Notice 
that the previous settings are preserved. 

m Click the Smoother tab in the Scatterplot dialog box, and select LOWESS 
smoother. 
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& Graph: Scatterplot 


AlAses | Layout Legend || Color | Fa | Symbol and Label] Suface and Line Sty 
Main || Options| Smoother | Residuals | Coordinates | X:Axis | Y-Axis | 2A» || 


Smoother method 


ONone  OSpline © Midrange (O Kriging 
OLinea © Step O Andrews © Angle 


© Quadratic QNEXPO © Bisquare Order 
O Log Olnverse © Huber Ratio 
| OPower OMean © Trimmed 
| @LOwess O Median 
| Obwis O Mode 


Di o 


| E Limit smoother domain to data range 
| C Confidence interval on regression line {0.95 4 


m Click OK to execute the program. 


The resulting line displays a “typical” calorie value for each value of FAT without 
fitting a mathematical equation to the complete sample. 
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600 een ERS 


CALORIES 


100 1_ at aa a 
0 10 20 30 40 
FAT 


The smoother indicates, not surprisingly, that foods with a higher fat content tend to 
have more calories. 

You may wonder what foods and what brands have the most calories? The fewest 
calories? The highest fat content? The lowest fat content? 

m Return to the Scatterplot dialog box. 


m Click the Symbol and Label tab in the Scatterplot dialog box, click Display case 
labels in the Case labels group, select BRANDS to label each plot point with the 
brand of the dinner, and set the case label size to 1.3. Repeat these steps for 
FOODS. 
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S Graph: Scatterplot 


Main || Opti Smoother Residuals | oord 


| All Axes || Layout || Legend Color | Fill 


Symbol type 


| © Select symbol 


| O Enter dct 
| O Select variable [BRANDS | 


SSS 


500} 4 


CALORIES 
CALORIES 


*beef 4 


}@ chicken 
100, 1 -L l 100 fi 1 L 


10 20 30 40 0 10 20 30 40 
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The top point in each plot is a chicken dinner made by sw—it must be fried chicken, 
Notice that the beef dinner by gor at the far right (close to the 300 calorie mark) 
contains considerably more fat than other dinners in the same calorie range. 


Do diet dinners really have fewer calories and less fat than regular dinners? The 
dinners in the sample were selected from shelves where both regular and diet dinners 
were featured (DIETS no and yes, respectively). 


m Return to the Scatterplot dialog box. 

m Select DIETS as the grouping variable. 

= Select Overlay multiple graphs into a single frame. 
a 


Deselect Display case labels in the Symbol and Label tab, and select None as the 
Smoother method in the Smoother tab. 


m Click OK. 


600 T T 


500+ 


maoh 6 
Š ara E 
E y ee | 
% 300 hoe a 
200} ** 4 
A 
1 1 1 
al 10 20 30 
FAT 


40 


Click the Options tab in the Scatterplot dialog box. 
Select Confidence kernel and enter a p-value of 0.75 for a 75% confidence region. 


It is clear from the sample that the DIETS yes dinners have fewer calories and less fat 


than the regular dinners. 
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Using Commandspace 


Each time you use a dialog box to perform a step in an analysis, a command is 
generated. These “commands” are SYSTAT's instructions to perform the analysis. 
Instead of using dialog boxes to generate these commands, you can use the 
Commandspace and type them yourself. Whether generated by the dialog box or typed 
manually, the commands from each SYSTAT session can be saved in a file, modified, 
and resubmitted later. Although many users will use dialog boxes exclusively, we 
introduce commands here briefly to show how commands succinctly document the 
steps in your analysis. If you do not expect to use commands, you should skip the 
sections showing them. 


You can type commands in the Commandspace of the SYSTAT window at the prompt 
(>) on the Interactive tab. When the Log tab is selected in the Commandspace, the 
commands corresponding to your dialog box choices are also displayed in the 
Commandspace. For example, the following command was generated by the 
Scatterplot dialog box selections. 


Žž [REN -- Following conmands|yere produced BY the PLOT DIALOG: a) 


PLOT FAT*CALORIES 
REM -- END of commands from the PLOT DIALOG 

5 AO LEKE T AS (ECAA hd 
Interacti it Sip ate ee -x 


Fe r era a 
(OM E RENE neen QGRAPH | UTM | ECHO SEa WGE) LIO SAT OVE GAP NUM 


If you enter commands from Interactive tab, you can recall previous commands by up 
and down arrow keys or by using F9 key. 


Sorting and Listing the Cases 


Detailed graphics and statistics may not always be what you need—sometimes you can 
learn a lot simply by looking at numbers. This section shows you how to sort the 
dinners by type of food (FOODS), and, within the foods, by fat content. 

m From the menus choose: 


Data 
Sort File... 


m Inthe Sort dialog box, select FOOD$ and FAT as the variables, and then click OK. 
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™, Data: Sort File 


Available variable{s}: 
BRANDS 
FOODS 
CALORIES 
FAT 
PROTEIN 


= From the menus choose: 


Data 
List Cases... 


m Select FOOD$, FAT, CALORIES, PROTEIN, and BRAND$§ as the variables. 
m Inthe Format group, enter 7 for Column width and 0 for Decimal places. 


Data: List Cases 


Selected variable(s}: 


BRANDS FOODS 
FAT 
FOODS | 
2] | caLories 
SORES = PROTEIN 
FAT r BRANDS 


PROTEIN 


VAT AMAINLA 


m Click OK. 


104 


Chapter 3 


Case | FOODS FAT CALORIES PROTEIN BRANDS 
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Within each type of food, the fat content varies markedly. The diet brands ww, lc, and 
hc are the first entries under chicken and pasta. If the data file were larger, you would 
have to scan pages and pages of listings and it would be hard to see relationships (see 
the descriptors in the next section). Note that you can sort and list data in any 
procedure. 


A Quick Description 


As an early step in data screening, it is useful to summarize the values of grouping 
variables and to scan summary descriptors of quantitative variables. 


Frequency Counts and Percentages 


The One-Way Frequency Tables on the Analyze menu, features many Print options that 
allow you to customize exactly what reports appear in your output. For example, the 

Frequency distribution option reports the number of times (frequency) each category 

of a grouping variable occurs and expresses it as a percentage of the total sample size. 
Cumulative frequencies and percentages are also available. In our “grabbing” sample 
strategy, we are interested in knowing what foods and how many of each brand and diet 
type we have. 

m From the menus choose: 


Analyze 
One-Way Frequency Tables... 


m Inthe Tables group of the One-Way Tables dialog box, select Frequency 
distribution. 
m Select FOODS, BRANDS, and DIETS as the variables. 
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tH Analyze: One-Way Frequency Tables 


Available variable(s): 
CALCIUM 
IRON 
COST 
DIETS 

OAS, 

Tables = 
| [F] Frequency distribution E Counts 


| E Counts and percents Percents 


Measures 


m Click OK 
Frequency Distribution for FOOD$ 
FOOD$ i Frequency Cumulative Percent Cumulative 
Percent 


21.429 21.429 
chicken 14 20 50.000 71.429 
pasta 8 28 28,571 100.000 


BRANDS | Frequency Cumulative Percent Cumulative 
i Frequency Percent 
+ 
gor ! 4 4 14.286 14.286 
he H 3 7 10.714 25.000 
le H 5 12 17.857 42.857 
st i 4 16 14.286 57.143 
Sw H 3 19 10.714 67.857 
ty H 4 23 14.286 82.143 
ww i 5 28 17.857 100.000 
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DIET$ | Frequency Cumulative 


i Frequency 
saisan PEE OE a=: A 
no i 15 15. 
yes 13 28 


Percent Cumulative 
Percent 


53.571 53.571 
46.429 100.000 


SYSTAT Basics 


In above output, for FOODS (the name appears at the top left in the first table), 14 of 
the 28 dinners in the sample (50% in the Pct column) are chicken, 28.6% are pasta, and 
21.4% are beef. The number of dinners per BRANDS (second table) ranges from three 
to five. There are 15 regular (DIETS no) dinners and 13 diet (DIETS yes) dinners. 


The List layout option in Two-Way Tables in the Analyze menu is useful for 
summarizing counts that result from cross-classifying two factors. Let us look at 
combinations of DIET$ and BRAND$. 


m From the menus choose: 


Analyze 
Tables 
Two-Way... 


m In the Options group of the Two-Way Tables dialog box, select List layout and 


deselect Counts. 


m Select DIETS as the row variable and BRANDS as the column variable. 
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$8 Analyze: Tables: Two-Way 


BRANDS 
FOODS 
CALORIES 
FAT 
PROTEIN 
VITAMINA 


al (0 


m Click OK. 
Frequency Distribution for DIET$ (rows) by BRAND$ (columns) 


DIETS BRANDS Frequency Cumulative Percent Cumulative 


1 
i 
i Frequency 
2=22220> sannnn- poonnn------- 8-8-8 eee 
no gor i 4 4 14.286 
no st H 4 8 14.286 
no sw i 3 11 10.714 
no ty H 4 15 14.286 
yes he { 3 18 10.714 
yes lc H 5 23 17.857 
yes ww i 5 28 17.857 


There are two DIET$ and seven BRANDS categories—so there should be 14 
combinations, but only 7 are shown here. The brands for the diet dinners differ from 
those for the regular dinners. 

You may want to display frequencies for two factors as a two-way table. Let us 
deselect the List layout feature and look at DIETS by FOODS. 
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m From the menus choose: 


Analyze 
Tables 
Two-Way... 


m Select D/ET$ as the row variable and FOOD$ as the column variable. 
= Deselect List layout (click the check box to deselect it if it is currently selected) and 
select Frequencies from the table box. 


Counts 


DIET$ (rows) byFOOD$(columns) 


| beef chicken pasta Total 
een Pa eRe TRS AR 
Hl 
i 
H 


We failed to get any beef dinners in the DIETS yes group. 


Descriptive Statistics 


It is easy to request a panel of descriptive statistics. However, since we have not 

examined several of these distributions graphically, we should avoid reporting means 

and standard deviations (these statistics can be misleading when the shape of the 

distribution is highly skewed). It is helpful to scan the sample size for each variable to 

determine whether values are missing. The basic statistics are number of observations 

(N), minimum, maximum, arithmetic mean (AM), geometric mean, harmonic mean, 

sum, standard deviation, variance, coefficient of variation (CV), range, median, 

standard error of AM, etc. 

m From the menus choose: 

Analyze 

Basic Statistics... 

m Inthe Analyze: Basic Statistics dialog box, select all of the variables in the source 

list (only numeric variables are available for this feature), and click OK to calculate 


the default statistics. 
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Analyze: Basic Statistics 


i | 
Mg woSided | ©) ETSE of skewness | | 
| Arithmetic mean (AM) [] Median { 
| CSE of AM Msp 


Doaan [sss] ps] Dy 
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| CALORIES 
PERSEE + 
N of Cases H 28.000 
Minimum | 160.000 
Maximum i 550.000 
Arithmetic Mean i 303.214 
Standard Deviation | 87.815 

| TRON 
Sat aa aiii 
N of Cases |} 28.000 
Minimum i 2.000 
Maximum i 25.000 
Arithmetic Mean i 10.464 
Standard Deviation | 5.467 


PROTEIN 


VITAMINA 


SYSTAT Basics 


CALCIUM 


For each variable, SYSTAT gives the number of cases with nonmissing values, the 
largest and smallest values, and the mean and standard deviation. CALORIES for a 
single dinner range from 160 to 550 with an average around 300 (303.214 to be exact). 
VITAMINA ranges from 0% to 100% with a mean of 18.9%. Since the mean is not close 
to the middle of the range, the distribution must be quite skewed or have a few extreme 


values. 


Statistics By Group 


You can use By Groups on the Data-menu to stratify the analysis. 


m From the menus choose: 


Data 
By Groups... 


In the By Groups dialog box, select DIETS as the variable, and click OK. 


Return to the Basic Statistics dialog box. 
Select the following measures: N, Minimum, Maximum, Arithmetic mean (AM), CI 


of AM, and Median. 
m Click OK. 
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Results for DIET$ = yes 


| CALORIES FAT PROTEIN VITAMINA CALCIUM 
ee EPERE j S aE sae 
N of Cases i 13.000 13.000 13.000 13.000 
Minimum i 160.000 0.000 0.000 2.000 
Maximum i 280.000 8.000 30.000 30.000 
Median i 240.000 4.000 15,000 8.000 
Arithmetic Mean i 230.769 3.885 15.077 9.769 
95.0% Lower Confidence Limit {| 209.769 2.544 7.921 4.629 
95.0% Upper Confidence Limit {| 251.770 5.225 22.233 14.910 


N of Cases . 3. 

Minimum i . 2. 

Maximum i 3 2. 

Median H * 2. 

Arithmetic Mean i 8.923 2.509 

95.0% Lower Confidence Limit | 5.999 2.265 

95.0% Upper Confidence Limit | 11.847 2.754 
Results for DIET$ = no 

FAT PROTEIN VITAMINA CALCIUM 

N of Cases 15.000 15.000 15.000 
Minimum 7.000 0.000 0.000 
Maximum 34.000 100.000 40.000 
Median 16.000 10.000 6.000 
Arithmetic Mean 16,800 22.267 11.800 
95.0% Lower Confidence Limit 12.247 6.231 4.735 
95.0% Upper Confidence Limit 21.353 38.302 18.865 


N of Cases 15.000 
Minimum 600 
Maximum 500 
Median 


Arithmetic Mean 
95.0% Lower Confidence 
95.0% Upper Confidence 


VNNNWEU 
œ 
a 
co 


The median grams of protein for the 13 diet dinners is 17; the mean is 16.8. For the 15 
regular dinners, these statistics are 22 and 22.1, respectively. Later we will request a 
two-sample f test to see if this is a significant difference. A 95% confidence interval 
for the average cost of a diet dinner ranges from $2.27 to $2.75. The confidence 
interval for the average cost of the regular dinners is larger—$2.21 to $2.94. 

The BY GROUPS variable, DIET$, remains in effect for subsequent graphical 
displays and statistical analyses. To disengage it, return to the By Groups dialog box 
and select Turn off. 
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A First Look at Relations among Variables 


What are the correlations among calories, fat content, protein, and cost? We can use 
correlations to quantify the linear relations among these variables. 


m From the menus choose: 


Analyze 
Correlations 
Simple.. 


m In the Simple Correlations dialog box, select Continuous data type and select 
Pearson from the Continuous data drop-down list. 


m Select CALORIES, FAT, PROTEIN, and COST as the variables. 


& Analyze: Correlations: Simple 


Main | Options Resampiing| 


Available variable(s): Selected vatiable(s}: 
BRAND$ CALORIES 

FAT 

FOOD$ 

CALORIES ea 


FAT 
PROTEIN 
VITAMINA Cohumnis} 


|<= Remove | 


Deletion 
© Continuous data: | Pearson E © Listwise 
O Pairwise 


O Distance measures: Bray Cutie 
O Rank order data: Speman an iY) | 
O Unordered data: Phi 2 o| | E Save matrix 


OBinay deta | Postivelmaiching 82) ~ | J | 


OKK 


m Click the Options tab and select Probabilities and Bonferroni. Because we study six 
correlations among four variables, we use Bonferroni adjusted probabilities to 
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provide protection for multiple tests. 


E Analyze: Correlations: Simple 


Main | Options | Resampling 


bt 


Probabilities 

© Bonferroni 

© Dunn-Sidak 

© Unconected | 


TEM estimation 
@ Normal 


Contaminated normal 
Probability. 


Variance: 
t Degrees of freedom |5 
Iterations: 


Convergence: 


w Click OK. 

Number of Observations: 28 

Means 

CALORIES FAT PROTEIN COST 


303.214 10.804 19,679 2.544 
Pearson Correlation Matrix 


CALORIES FAT PROTEIN cosT 


l 
+ 
CALORIES | 1.000 
i 
i 
H 
H 
i 
i 


FAT 0.758 1.000 
PROTEIN 0.550 0.279 1.000 
cost 0.099 -0.132 0.420 1.000 


115 


SYSTAT Basics 


Bart lets Chi-square Statistic : 38.865 
p-value : 0.000 
Matrix of Bonferroni Probabilities 

PROTEIN cosT 


0 
0.156 0.000 


In above output one Quick Graph is generated. This is the Quick Graph that SYSTAT 
automatically generates when you request correlations. Quick Graphs are available for 
most statistical procedures. If you want to turn off a Quick Graph, use Options on the 
Edit menu. 
The Quick Graph in this example is a scatterplot matrix (SPLOM). There is one 
bivariate scatterplot corresponding to each entry in the correlation matrix that follows. 
Univariate histograms for each variable are displayed along the diagonal, and 75% 
normal theory confidence ellipses are displayed within each plot. 

The plot of FAT and CALORIES (top left) has the narrowest ellipse, and thus, the 
strongest correlation (that is, given that the configuration of the points is spread evenly, 
is not nonlinear, and has no anomalies). 


In the Pearson correlation matrix displayed in above output, the correlation between 
FAT and CALORIES is 0.758. The p-value (or Bonferroni adjusted probability) 
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associated with 0.758 is printed as 0.000 (or less than 0.0005). As the scatterplot 
seemed to indicate, the FAT and CALORIES Pearson correlation matrix is correlated. 
PROTEIN also has a significant correlation with CALORIES (r = 0.55, 

p-value = 0.014). We are unable to detect significant correlations between COST and 
CALORIES, FAT, and PROTEIN. 


Subpopulations 


The presence of subpopulations can mask or falsely enhance the size of a correlation. 
With Correlations, we could specify DIETS as a BY GROUPS variable as we did 
previously. Instead, let us examine the data graphically and use 75% nonparametric 
kernel density contours to identify the diet yes and no groups. We will also look at 
univariate kernel density curves for the groups. 


w From the menus choose: 


Graph 
Scatterplot Matrix (SPLOM)... 
Select CALORIES, FAT, PROTEIN, and COST as the Row variables. 
Select DIETS as the Grouping variable. 
Select Kernel Curve from the drop-down list for Density displays in diagonal cells. 


Select Only display bottom half of matrix and diagonal and Overlay multiple graphs 
into a single frame. 
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i: Graph: Scatterplot Matrix (SPLOM) 


| Fill Symbol andLabel || Line Style 
Man | Layout || Smoother | xAsis | Y-Avis | Layout | Legend 
Available variable(s}: Row variable(s): 


BRAND$ CALORIES 


FOODS FAT 
= PROTEIN 
CALORIES CAEN 


FAT z 
PROTEIN Column variable{s): 


VITAMINA { ») | (Same as Row) 


CALCIUM 
IRON 


DEN Grouping variable{s): 
s DIET$ 


Density displays in diagonal cells: | Kemel Curve K 


Specify separate row and column variables [Z] Transpose matrix 
[Z] Ony display bottom half of matrix and diagonal 
Overlay multiple graphs into a single frame 


aaa 


m Click the Options tab in the Scatterplot Matrix dialog box. 
m Select Confidence kernel and enter the value of p as 0.75. 
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in Graph: Scatterplot Matrix (SPLOM) 


Connectors/partitions 
E Line connected in case order 


© Sample (E) p: [0.6827 4 E Traveling salesman path 


| OCentioid Em) p: [035 = | | E Mirimum spanring tree 
E Vertical spikes to Y 


m TEE E Vector lines from x: 
| E Convex hull around all points 


E Influence on corelation coefficient 
r Overlapping data- 
| © Points overlap 
O Slight random jitter 
I O Sunllower symbols 


} 


m Click OK. 
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CALORIES 


FAT 


PROTEIN 


E v 
E yes 


For CALORIES and FAT, look at the separation of the univariate densities on the 
diagonal of the display. Notice that the price range (COST) at the bottom right for the 
diet dinners is within that for the regular dinners. COST is the Y-variable in the bottom 
row of plots. Within each group, COST appears to have little relation to CALORIES or 
FAT. It is possible that COST has a positive association with PROTEIN for the regular 
dinners (open circles in the COST versus PROTEIN plot). 

Is there a relationship between cost and nutritive value as measured by the 
percentage daily value for vitamin A, calcium, and iron? Repeat the steps for the 
previous plot, but select VITAMINA, CALCIUM, IRON, and COST as the row 


variables. 
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VITAMINA 


COST is the Y-variable for each plot on the bottom row. There is no strong relationship 
between cost and nutritive value (as measured by VITAMINA, CALCIUM, and IRON), 
but there is a small cluster of low-cost dinners with high-calcium content. Later, we 
will find that these are pasta dinners. 


3-D Displays 


In this section, we use 3-D displays for another look at calories, protein, and fat. In the 
display on the left, we label each dinner with its brand code; in the display on the right, 
we use the cost of the dinner to determine the size of the plot symbol. 

To produce 3-D displays: 

m From the menus choose: 


Graph 
Scatterplot... 


m Inthe Scatterplot dialog box, select FAT as the X-variable, PROTEIN as the 
Y-variable, and CALORIES as the Z -variable. 


Select Display grid lines in the X-Axis, Y-Axis, and Z-Axis tabs. 
m Click the Options tab and select Vertical spikes to Y from the Connectors/partitions 
group. 
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= To produce the plot on the left, click the Symbol and Label tab, click Display case 
labels in the Case labels group, and select BRANDS to label each plot point with 
the brand of the dinner. 

m To produce the plot on the right, click the Symbol and Label tab, click Select 
variable in the Symbol size group, and select COST as the symbol size variable. 


CALORIES 


° 
a © ont 
o So 

Notice the back corner of the display on the left—the tallest spike extends to sw, 
indicating the dinner with the most calories. On the floor of the display, we read that 
its fat content is between 20 and 30 grams and that its protein is a little over 20 grams. 
We see this same point in the display on the right—the size of its circle is not extreme, 
indicating a mid-range price. Notice the small circle toward the far right—this dinner 
costs much less than the sw dinner and has a higher fat content and a similar protein 
value. The most expensive dinners (that is, the larger circles) do not concentrate in a 
particular region. 


A Two-Sample t-Test 


One of the most common situations in statistical practice involves comparing the 
means for two groups. For example, does the average response for the treatment group 
differ from that for the control group? Ideally, the subjects should be randomly 
assigned to the groups. 

For the food data, we are interested in possible differences in PROTEIN and 
CALCIUM between the diet and regular dinners. Thus, the dinners are not randomly 
assigned to groups. In a real observational study, a researcher should carefully explore 
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the data to ensure that other factors are not masking or enhancing a difference in 
means. 


In the t-test, we test the hypothesis, 

Hp: Means of diet and regular dinners are equal. 

The alternative to this hypothesis could be 

H,: Mean of Diet is "greater" than mean of regular, or 

H,: Mean of Diet is "not equal" to mean of regular, or 

H,: Mean of Diet is "less" than mean of regular. 

Since we have no information, let us choose the second alternative H}: Mean of diet is 


“not equal” to mean of regular. In other words, do diet and regular dinners differ in 
protein and calcium content? In this example, we use the t-test procedure. 


m From the menus choose: 


Analyze 
Hypothesis Testing 
Mean 
Two Sample t-Test... 


m Inthe Two-Sample t-Test dialog box, select PROTEIN and CALCIUM as the 
variables, and select DIET$ as the grouping variable. 


m In the Alternative type, choose ‘not equal’. 
m Click OK. 
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ei! TIO a 


Selected variable{s}: 
PROTEIN a | PROTEIN 
VITAMINA CALCIUM 
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Two-sample t-test on PROTEIN Grouped by DIET$ vs Alternative = 'not equal' 


GROUP | N Mean Standard 
H Deviation 

------- + 

no i 

yes i 


Separate Variance 


Difference in Means 
95.00% Confidence Interval : 
t 
df 
p-value 


Pooled Variance 


Difference in Means : 5.287 
95.00% Confidence Interval : 1.922 to 8.653 
t : 3.229 
df : 26.000 


p-value : 0.003 
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Two-sample t-test on CALCIUM Grouped by DIET$ vs Alternative = ‘not equal’ 


GROUP | N Mean Standard 

Í Deviation 
oe parece hapa apaecoenisimespetece 
no Eo 11.800 12.757 
yes | 13 9.769 8.506 


Separate Variance 


Difference in Means : 2.031 
95.00% Confidence Interval : -6.322 to 10.384 
t : 0.501 
af : 24.520 
p-value : 0.621 


Pooled Variance 


Difference in Means : 2.031 


95.00% Confidence Interval : -6.538 to 10.600 
t : 0.487 
df : 26.000 
p-value : 0.630 


Two-sample t-test 


The t-test procedure produces two density plots as Quick Graphs. On the far left and 
right sides of the density plot for each test variable are box plots for each category of 
the grouping variable. The box plot on the left side of each graph is for the DZET$ no 
group, and the box plot on the right side of each graph is for the DIETS yes group. 

The middle portion of each graph shows the actual distribution of data points, with 
a normal curve for comparison. 

The results in the box plots for PROTEIN are desirable. The median (horizontal line 
in each box) is in the center of the box, and the lengths of the boxes are similar. Also, 
the peaks of the normal curves, which represent the mean for a normal distribution, are 
very close to the median values. This indicates that the distributions are symmetric and 
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have approximately the same spread (variance). This is not true for CALCIUM. These 
distributions are right skewed and possibly should be transformed before analysis. 

The mean values for PROTEIN are the same as those in the By Groups statistics— 
22.133 and 16,846. The standard deviations differ little (4.307 and 4.337), confirming 
what we observed in the box plots. This means that we can use the results of the 
pooled-variance ¢ test printed below the means. This test is usually the first one you see 
in introductory texts and assumes that the distributions have the same shape (that is, 
the variances do not differ). For PROTEIN, we conclude that the mean of 22.1 for the 
regular dinners does differ significantly from the mean of 16.8 for the diet dinners 
(t = 3.229, p-value = 0.0003). 

The separate-variance f test does not require the assumption of equal variances. 
Considering the distributions for CALCIUM displayed in the box plots and that the 
standard deviations for the groups are 12.757 and 8.506, we use the separate-variance 
t test results. We are unable to report a difference in average CALCIUM values for the 
regular and diet dinners (t = 0.501, p-value = 0.621). 

The discussion of SYSTAT’s procedures is very exploratory at this stage, so you 
should not conclude that CALCIUM values are homogeneous. Always take the time to 
think about what possible subgroups might be influencing or obscuring results. 


A One-Way Analysis of Variance (ANOVA) 


Does the cost of a dinner vary by brand? Let us try an analysis of variance (ANOVA) 
to determine whether the average price of frozen dinners varies by brand. After looking 
at the graphics earlier in this chapter, we assume that differences do exist, so we also 
request the Tukey HSD test for post hoc comparison of means. This test provides 
protection for testing many pairs of means simultaneously; allowing us to make 
statements about which brand’s average cost differs significantly from another brand’s. 
Before we run the analysis of variance, we will specify how the brands should be 
ordered in the output (results will be easier to follow if we order the brands from least 
to most expensive). 
m From the menus choose: 
Data 

Order of Display... 
m Inthe Order dialog box, select BRANDS as the variable. 
m Select Enter sort and type ‘gor’, ‘hc’, 'sw', ‘Ic’, ‘ww’, 'st’, 'ty'. 
m Click OK. 
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m From the menus choose: 


Edit 
Options... 


m Inthe Output Results group on the Output tab, select Long from the Length 
drop-down list. (This will provide extended results for the analysis of variance.) 


m Click OK. 
To request an analysis of variance: 


= From the menus choose: 


Analyze 
Analysis of Variance 
Estimate Model... 


m Inthe Analysis of Variance: Estimate Model dialog box, select COST as the 
dependent variable and BRANDS as the factor variable. 


m Click OK. 


Available variable{s}: 
BRANDS 
F00D$ 
CALORIES 
FAT 

PROTEIN 
VITAMINA 
CALCIUM 
IRON 


COST 
DIETS 
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Effects coding used for categorical variables in model. 
Categorical values encountered during processing are 


Variables Levels 


Dependent Variable 
N 


Multiple R 
Squared Multiple R 


Analysis of Variance 


daft Mean Squares F-ratio p-value 


S 


BRAND$ į 6.017 6 1.003 10.042 0.000 
Error | 2.097 21 0.100 


Least Squares Means 
Factor | Level LS Mean Standard Error N 


We can point out that the means are ordered by increasing cost because of the Order 
feature. This feature also pertains to graphical displays. 
m From the menus choose: 


Graph 
Bar... 


m Select BRAND$§ as the X-variable and COST as the Y-variable. 
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M Graph: Bar Chart 


Layat Ñ Legend 
i zeg Coordinates L X-Axis ce 


C BRANDS 
FOODS 
CALORIES 
FAT 
PROTEIN 
VITAMINA OST Repeated trials 
CALCIUM 
IRON 
cost 
DIETS 


E Counts of Y *x 


[C] Matrix columns 
Display as: 
EJ 


pooo |_} Mirror (Dual) 
oe e 
[C] Overlay multiple graphs into a°single frame 

{_|Stack bars of multiple variables | Range between two vanables 


m Click the Error Bars tab and select Standard error from the type group. 
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m Click the Fill tab, select Select fill from the Fill pattern group, and select [_] as the 


Fill Pattern. 
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Ml Graph: Bar € ae 


c Fill pattern — 


| | © Defaut fil selection 
@ Select fil: 


m Click OK. 


COST 
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2 
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The F-ratio in the Analysis of Variance table at the beginning of the output indicates 
that there are one or more differences in average price among the seven brands 
(F-ratio = 10.0415, p-value < 0.0005). 


Tukey Pairwise Mean Comparisons 


Let us use SYSTAT's advanced hypothesis testing capability to request Tukey's 
Pairwise Mean Comparison test. 
From the menus choose: 


Analyze 
Analysis of Variance 
Pairwise Comparisons... 


m Specify BRAND$ under Groups and select Tukey under Tests. 


Analyze: Analysis of Variance (ANOVA): Pairwise Comparisons |? I x| 


Available effect(s}: 


"o e 


Tests 
| | © Equal variances 
E Tukey El Duncan [Dunnett 
[Bonferroni OREG-wa (® 2-sided 
E Fisher's LSD C Hochberg's GT2 O Lessthan control 
O Sidak E Gabriel O Greater than contol 
E Scheffe E Student: Newman-Keuls Contiot [pa tn] 
C Tukey's b 

o iisa variances 
j JT amhane's T2 a Games-Howell [_ ] Dunnett's T3 
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m Click OK. 


Post Hoc Test of COST 
Using least squares means. 
Using model MSE of 0.100 with 21 df. 


Tukey's Honestly-Significant-Difference Test 


BRANDS (i) BRANDS (j) Difference p-value 95.0% Confidence Interval 
Lower Upper 


gor he 0.984 -0.975 0.595 
gor sw 0.590 -1.208 0.361 
gor lc 0.010 -1.533 -0.155 
gor ww 0.009 -1.549 -0.171 
gor st 0.001 -1.831 -0.379 
gor ty 0.000 -2.166 -0.714 
he sw 0.968 -1.072 0.605 
he le 0.115 -1.404 0.096 
hce ww 0.100 -1.420 0.080 
he st 0.016 -1.700 -0.130 
he ty 0.001 -2.035 -0.465 
sw lc 0.548 -1.171 0.330 
sw ww 0.506 -1.187 0.314 
sw st 0.117 -1.466 0.103 
sw ty 0.006 -1.801 -0.232 
Ie ww 1.000 -0.666 0.634 
le st 0.874 -0.950 0.428 
le ty 0.120 71.285 0.093 
ww st 0.903 0.934 0.444 
ww ty . 0.138 -1.269 0.109 
st ty -0.335 0.742 -1.061 0.391 


Let us read the Tukey results appearing above. The first and second columns represent 
the pair and the third column indicates the difference in cost for each pair of means. 
Differences between the gor brand and the others are reported in column 3 ($0.19 with 
hc, $0.42 with sw, and $1.44 with ży). The fourth column reports the probability 
associated with each difference. Gor is significantly less expensive than all brands 
except hc and sw. 


In column 3, notice that, on the average, the he brand costs $0.915 less than the s¢ brand 
and $1.25 less than the ty brand. From the probability table, these differences are 
significant with probabilities of 0.015650 and 0.000672, respectively. The only other 
significant difference is that the average price for the sw brand costs $1.02 less than the 
ty brand. 


A Two-Way ANOVA with Interaction 


Do nutrients vary by type of food? Earlier, in a scatterplot matrix, we observed a small 
cluster of dinners that had higher calcium values than the others. In the two-sample 

t-test, we were unable to detect differences in average calcium values between the diet 
and regular dinners. Let us explore further by using both food type and dinner type to 
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define cells—that is, we request a two-way analysis of variance. Using the Counts 
feature in Two-Way Tables, we found that although our sample has beef, chicken, and 
pasta dinners, there were no beef dinners in the DIETS yes group. (SYSTAT can 
analyze ANOVA designs with missing cells. See SYSTAT, Statistics IL, Chapter 3 for 
more information.) 

Let us use Select Cases on the Data menu to omit the beef dinners, and then request 
an analysis of variance for a two-by-two design (DIET$ yes and no by chicken and 
pasta). 

m From the menus choose: 


Data 
Select Cases... 


In the Select dialog box, select FOOD$ as Expression1. 
Select <> (not equal) from the drop-down list of operators. 


m For Expression2, type 'beef' (include the quotation marks while working with 
commands, the dialog box takes care of this.). 


m Click OK. 
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To get a bar chart of the cell means: 
C] 


From the menus choose: 


Graph 
Bar... 


Select CALCIUM as the Z-variable, DIETS as the Y-variable, and FOODS as the 
X-variable. 


Click the Error Bar tab and select none from the type group. 


Click the Fill tab, select Select fill from the Fill pattern group, and select solid Fill 
Pattern. 
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Ml Graph: Bar Chart 


77 


Color 
XAxis 


Available variable{s): 

C BRAND$ 
FOODS — 
CALORIES 
FAT 
PROTEIN 
VITAMINA Repeated trials 
CALCIUM 
IRON 
cost 
DIETS 


_} Counts of Y "x. 
{_] Matrix columns 
Display as: 


[C] Miror (Dual 
{_} MultiPlot 


[C] Overlay multiple graphs ihto a single frame 
| Stack bars of multiple variables [_] Range between two variables 


CALCIUM 
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Suggestion. Try using the Dynamic Explorer to rotate this 3-D bar chart. 


The box plot in the two-sample t-test example shows that the distributions of calcium 
for the yes and no groups are skewed and have unequal spreads. Let us use a root 
transformation of CALCIUM to make its distribution symmetric. 


Before requesting the analysis of variance, we will transform CALCIUM, taking the 
square root of each value. 
=m From the menus choose: 


Data 
Transform 
Let... 
In the Let dialog box, select CALCIUM as the variable, select SQR from the list of 
mathematical functions, and select CALCIUM from the variable list and add it to the 
expression. The Expression box should now look like this: SQR(CALCIUM). 


Data: Transform: Let 


PROTEIN 


m Click OK. 
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m Now request the analysis of variance, repeating the steps in the last example, except 
that here we use CALCIUM as dependant variable and both DIET$ and FOODS as 
the factor variables. 


Data for the following results were selected according to 
SELECT (FOOD$ <> ‘beef') 


Effects coding used for categorical variables in model. 
Categorical values encountered during processing are 


Variables 

Stes ae pa + 

DIET$ (2 levels) | no yes 
FOOD$ (2 levels) | chicken pasta 
Dependent Variable | CALCIUM 

N H 22 
Multiple R H 0.804 
Squared Multiple R } 0.647 


CONSTANT H 3.380 
DIET$ i no 0.305 
FOODS | chicken -1.423 
FOOD$ | !MISSING! -0.639 
DIET$*FOOD$ | no*chicken -0.639 
DIET$*FOOD$ | no*!MISSING! 0.000 


Analysis of Variance 


Source | Type III ss df Mean Squares F-ratio p-value 
Baom SBS i ie arapa a aa 
DIET$ 1 1.807 1 1.807 1.432 0.247 
FOOD: j 39.298 1 39.298 31.136 0.000 
DIET$*FOOD$ | 7.908 1 7.908 6.266 0.022 
Error i 22.719 18 1.262 


Least Squares Means 


Factor | Level LS Mean Standard Error N 
Paape NNN E P REE EEr OSE S 
DIET$ | no 3.685 0.397 9.000 
DIET$ | yes 3.074 0.320 13.000 


Least Squares Means 


Factor | Level LS Mean Standard Error N 
FOOD$ | chicken 1.956 0.303 14.000 
FOOD$ | pasta 4.803 0.410 8.000 
i 
i 


DIET$*FOOD$ | no*chicken A 
DIET$*FOOD$ | no*pasta 5.747 3.000 
DIET$*FOOD$ | no*!MISSING! 3.685 0.000 


138 


Chapter 3 


DIET$*FOOD$ | yes*chicken 2.289 0.397 8.000 
DIET$*FOOD$ | yes*pasta 3.859 0.502 5.000 
Durbin-Watson D Statistic |} 1.754 
First Order Autocorrelation |} 0.094 


Information Criteria 


AIC | 73.140 
AIC (Corrected) | 76.890 
Schwarz's BIC | 78.596 


The significant DIET$ by FOODS interaction suggests exercising caution when 
interpreting main effects. The main effect for DIETS does not appear to be significant 
(p-value = 0.247)—but let us look at a scatterplot and see if that tells us anything more. 
m From the menus choose: 


Graph 
Scatterplot... 


m Select CALCIUM as the Y-variable and D/ET$ as the grouping variable. (SYSTAT 
will automatically use the case number as the X-variable.) 


Select Overlay multiple graphs into a single frame. 


m Click the Symbol and Label tab, click Select symbol, select a circle for the first 
symbol and a triangle for the second. 

m Check Display case labels in the Case labels group and select FOODS as the case 
label variable. 

m Click the Fill tab, click Select fill in the Fill pattern group, and select a solid fill for 
both the first and second fill patterns. 


m Click OK. 
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The scatterplot shows that all of the dinners with a square root value for CALCIUM 
over 4 are pasta dinners (which is consistent with the significant main effect for 
FOOD$)—but it also shows that the highest values are also regular (D/ET$ = no) 
dinners. This suggests that further investigation might be warranted. 


Bonferroni Pairwise Mean Comparisons 


Since we have a significant DIET$ by FOODS interaction, we should be cautious about 
interpreting main effects. Let us use SYSTAT’s advanced hypothesis testing capability 
to request Bonferroni adjusted probabilities for tests of pairwise mean differences. 


m From the menus choose: 


Analyze 
Analysis of Variance 
Pairwise Comparisons... 


m Specify DIETS +» FOODS under Groups and select Bonferroni under Test group. 
m Click OK. 


Post Hoc Test of CALCIUM 


Using least squares means 
Using model MSE of 1.262 with 18 df. 
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Bonferroni Test 
DIETS (i) *FOOD$- DIETS (j) *FOODS- Difference p-value 95.0% Confidence Interval 
(i) G) 


Lower Upper 
no*chicken no*pasta -4.124 0.001 -6.663 -1.585 
no*chicken yes*chicken n . 0.000 0.000 
no*chicken yes*pasta -0.667 1.000 -2.606 1.273 
no*chicken -2.236 0.041 -4.411 -0. 
no*pasta yes*chicken 5 ; 0.000 0.000 
no*pasta yes*pasta 3.457 0.003 1.026 5. 
no*pasta 1.888 0.336 -0.735 4. 
yes*chicken yes*pasta . . 0.000 0 
yes*chicken . . 0.000 0.000 
yes*pasta -1.570 0.247 -3.617 0 


We are interested in four of the six differences (and probabilities) in these panels. First 
we look within diets and then within food types. For the: 


It 
a 
Gi 


regular meals (DIET$ no), the difference in average CALCIUM content between 
chicken and pasta meals is highly significant (the difference in square root units is 
4.124, p-value = 0.001). 


diet meals (DIETS yes), the difference in average CALCIUM content between 
chicken and pasta is not significant (1.570, p-value = 0.247). 


pasta meals, the difference in average CALCIUM content between the DIETS yes 
and no groups is not significant (—1.888, p-value = 0.336). 


chicken meals, the difference in average CALCIUM content between DIETS yes 
and no groups is not significant (0.667, p-value = 1.000). 

will be more clear if you see a dot display of these means. 
Select 

raph 


Summary Charts 
Dot... 


Choose CALCIUM as the Y-variable and DIETS as the X-variable. 
Specify FOODS as the grouping variable. 
Select Overlay multiple graphs into a single frame. 


Click the Error Bars tab, choose Standard error from the Type group, and specify 
a value of 0.9545. 


Click Options tab and select Line connected in left-to-right order. 
Click OK. 
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For the regular meals (DIETS no), the error bars do not overlap, indicating a significant 
difference in calcium content between pasta and chicken. However, for the diet meals 
(DIETS yes), the overlapping error bars suggest no significant difference between the 


meal types. 
Focusing on the pasta meals, the average calcium content for the diet meals is 


within two standard errors of the average calcium content for the regular meals. 
Similar observations can be made for the chicken meals. 


Summary 


The first step in any data analysis is to look at your data. SYSTAT provides a wide 
variety of graphs that can help you identify possible relationships between variables, 
spot outliers that may unduly effect results, and reveal patterns that may suggest data 
transformations for more meaningful analysis. 

SYSTAT also provides a wide variety of statistical procedures for analyzing your 
data, We have covered some of the most common and basic statistical techniques in 
this chapter, and we have still barely scratched the surface. 
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Data Analysis Quick Tour 


This chapter provides a quick tour of SYSTAT’s capabilities, using data from a survey 
of uranium found in groundwater. 


Groundwater Uranium Overview 


The U.S. Department of Energy collected samples of groundwater in west Texas as 
part of a project to estimate the uranium reserves in the United States. Samples were 
taken from five different locations, called producing horizons, and then measured for 
various chemical components. In addition, the latitude and longitude for each sample 
location were recorded. Several questions are of interest: 

m Does the uranium concentration vary by producing horizon? 

m Is the presence of uranium correlated to the presence of other elements? 


m What is the overall geographic distribution of uranium in the area? 
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The data for the groundwater uranium study are in the file GDWTRDM. Measurements 
were recorded for the following variables: 
Table 1: 

Variable Description 

SAMPLE The ID of the groundwater sample 
LATITUDE Latitude at which the sample was taken 
LONGTUDE Longitude at which the sample was taken 
HORIZONS Initials of producing horizon 

HORIZON ID of producing horizon 

URANIUM Uranium level in groundwater 

ARSENIC Arsenic level in groundwater 

BORON Boron level in groundwater 

BARIUM Barium level in groundwater 
MOLYBDEN Molybdenum level in groundwater 
SELENIUM Selenium level in groundwater 
VANADIUM Vanadium level in groundwater 
SULFATE Sulfate level in groundwater 

TOT_ALK Alkalinity of groundwater 

BICARBON Bicarbonate level in groundwater 
CONDUCT Conductivity of groundwater 

PH pH of groundwater 

URANLOG Log of uranium level in groundwater 
MOLYLOG Log of molybdenum level in groundwater 

Potential Analyses 


The following kinds of analyses may be useful in analyzing the groundwater data: 
Basic Statistics 

Transformations 

ANOVA 

Nonparametric tests 

Regression 


Correlation 
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Cluster analysis 

Discriminant analysis 

Spatial statistics 

Smoothing techniques such as kriging 


Contour plotting 


In these examples, we will show you descriptive graphs, ANOVA, nonparametric tests, 
smoothing and contour plotting. 


The Groundwater Data File 


The data for this analysis are in the file GDWTRDM. 
m To open the file, from the menus choose: 
File 
Open 
Data... 
m Select GDWTRDM, and click Open. 
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Data files that are opened or imported can be viewed and edited in the Data Editor. You 
can also see the results of transform variables, select cases and so forth in the Data 
Editor. In this example, measurements were taken of the levels of uranium and various 
other elements in the groundwater at each producing horizon. The measurements for 
each variable can be viewed and manipulated directly in the Data Editor. 
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Data Analysis Quick Tour 
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Graphics 


Distribution Plot 


Since we will be looking extensively at uranium levels, it is a good idea to take a look 
at the distribution of this variable and make sure it meets assumptions for future 
analyses. 


To plot a histogram of URANIUM: 

m Click the Histogram icon |, in the Graph Toolbars. 
m Choose URANIUM and add it to the X-variable(s) list. 
m Click OK. 

SYSTAT displays the following plot in the Graph Editor: 
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We can see that the distribution of URANIUM is skewed. To properly apply most 
statistical analyses, the histogram should show a bell-shaped, normal distribution. 
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Exploring the Groundwater Data Interactively 


The Graph Properties dialog box is a tool that allows you to explore data interactively, 
increasing the efficiency of your analysis. It can be used to modify features of a graph 
or frame or elements of the graph. 


m To open the Graph Properties dialog box right-click on the graph which shows the 
Properties menu at the top. Now click the Properties menu to open the Graph 
Properties dialog box. 
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m Click the Axis tab in the Graph Properties dialog box and then select the Options 
tab. Select Power in the Transform combo box. This will enable the power combo 
box. 


m Use the down arrow key in the keyboard to change the power value of the X-axis 
until the graph becomes a bell-shaped curve. 


As you do this, SYSTAT is automatically calculating the power data transformation of 
the form URANIUM (power). A power of 0.5 is a square root transformation. A power 
of 0.333 is a cube root transformation. 


Transformed Graph 


At a power of 0, SYSTAT automatically performs a logarithmic transformation- for 
example, log (URANIUM). The log transformation appears to produce a very good 
bell-shaped curve. But this judgment is subjective and it is possible to use more formal 
and objective methods to examine the normality of the transformed data. 
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Normally, once the proper transformation has been identified using the Graph 
Properties dialog box, you create the transformed variable using the Data Editor. We 
have already performed the transformation and included the variable URANLOG in the 


data file for further statistical analysis. 


Histograms and Probability Plots 


Let us take another look at the URANIUM distribution. We are going to plot two 
graphs, a histogram and a probability plot, by using commands. From the menus, 
submit the command file GDWTRIDM. For this: 

m From the menus choose: 

File 


Submit 
File... 


m Select GDWTRIDM from the ‘Miscellaneous’ subfolder of the ‘command’ 
directory and click Open. 


m The following graphs are displayed in the Output Editor of the Viewspace: 


152 — 
Chapter 4 


Histogram for Uranium Probability Plot for Uranium 
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In this plot, we begin to glimpse SYSTAT’s color and overlay capabilities. This 
command file created a side-by-side overlay of a histogram and a probability plot of 
the URANIUM variable. 


SYSTAT Windows and Commands 


SYSTAT gives you the flexibility to perform your analysis the way you want: 

m Windows interface: icons, menus, and dialog boxes. 

= Typed commands: typing commands at the Commandspace. 

@ Batch (Untitled) command files: submitting files directly or from the 
Commandspace. 


Additionally, all menu actions can be optionally echoed to the Output Editor, allowing 
you to perform initial analyses using the menus, and then to cut and paste the 
commands into the Untitled tab of the Commandspace for repeated use. 


dena urenius / BIST, foolot™blve, COLOR], gill, TITLES Histogram for Oraniys 
PPLOT uranius / joc=ćin, oin, feOlor*gray, fiil, COLORe4, TITLE» Probabijity plow for Srantwe 
po 


THICK i 


Interactive Log Untitieg 
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Plotting Several Graphs Using Commands 


The commands in the file GDWTRIDM are: 


THICK 2 
USE GDWTRDM 
BEGIN 
DENS URANIUM / HIST, FCOLOR = BLUE, 
COLOR = GREEN, FILL, 
TITLE='Histogram for Uranium' 
PPLOT URANIUM / LOC = 6in,0in, FCOLOR = gray, 
FILL, COLOR = YELLOW, 
TITLE = 'Probability Plot for Uranium' 


END 
THICK 1 


The DENS and PPLOT commands create the histogram and the probability plot, 
respectively. Between the BEGIN and END statements, we can change the data file in 
use and plot an unlimited number of graphs. Each graph can have its own attributes, 
such as location and color. 


Plotting Several Graphs Using Menus 


Plotting more than one graph can be accomplished directly from SYSTAT’s menu. 


= From the menus choose: 


Graph 
Begin Overlay Mode 


m Choose graphs and options from menus and dialog boxes. You can choose 
locations for the graphs in the Layout tab, unless you want them overlaid on top of 


one another. 
m Then, from the menus choose: 


Graph 
End Overlay Mode (Display) 
Transforming Data and Selecting Cases 
In the Commandspace, select and submit the line beginning with PPLOT. Using the 


Graph Properties dialog box in the Workspace, transform the URANIUM variable by 
clicking the down arrow of X-Power until 0 is reached, yielding a log transformation. 
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Notice that the probability plot is much more linear. 


Using SYSTAT’s lassoing capability, you can isolate outliers. 
w Click the Lasso icon 


and lasso the two outliers on the lower left of the graph by holding down the left mouse 
button and circling them. 


w Click the Show Selection icon 
to highlight the selected cases. 


Dynamically Highlighted Cases 


Cases selected by the Lasso tool are highlighted in the Data Editor. Click on the Data 
Editor to see these cases, 30 and 31, directly. 
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e400 990.000 39.000 
9700 2000.000 sooo | 


1000 6480 esoo 1500000 150000 

101.978 TPO 1000 15780 15500 1500.000 75,000 

| 101.805 TPO. 1.000 21190 10.700 2000.000 100.000 
| 101,450 TPO 4000 13160 18200 550000 33,000 


| 101.798 TPO 4000 12330 T500 744.000 372.000 
101.859 TPO 5730 6100 500.000 100.000 

150.000 
13000 


2000 
33 150.000 
a620 12000 1250000 75000 
44430 8o00 1155000 $3000 
17960 12600 1750.000 eooo | 
15520 9.400 1500.000 60.000 | 
6200 637.000 36.000 


8700, 1000.000 


461.877 TPO 
401 932 TPO 


SYSTAT dynamically links data across graphs and the Data Editor. These cases are 
now selected. If you were to run a statistical analysis or plot another graph at this point, 
it would use only these two cases. As pointed out earlier, SYSTAT manages data and 


graphics globally. 

Make sure you deselect the data before continuing. Otherwise the remainder of the 
analyses will be done only on the selected observations. To deselect the cases, use the 
Lasso tool to select an area of the graph that contains no data points. 


Connections between Graphs and the Data Editor 


For those of you with a technical inclination, here is the explanation of the connection 
between the graphs and the Data Editor: 


m Graphs have their own data, allowing the real-time transformations of the Graph 
Properties dialog box and the ability to save and reload them without the original 


data file. 
= When a graph is plotted, the data in the graph are linked to the Data Editor, 
allowing lassoing. 
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m The Data Editor and the program kernel share the same data set, so all data are 
“live,” and what you see is what you get. For example, if you select data in the 
Graph Editor and then run a regression, the regression applies only to the selected 
data. 


Statistics 


This part of the tour introduces SYSTAT’s statistics capability. Here, we explore the 
question of whether the five producing horizons have varying levels of uranium by 
performing an ANOVA of URANLOG (the log of URANIUM) versus HORIZON. This 
analysis is being done based on the visual judgment that the normal distribution for 
log(URANIUM) is a valid model. 


m Inthe SYSTAT window, click the ANOVA icon {X on the Statistics toolbar. 
Select URANLOG as the dependent variable and HORIZON as the factor. 
Click on Options tab. 

Check Shapiro-Wilk option. 

Click OK. 
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Model | Repeated Measures | Options!) Resampling| 
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Available variable{s): 
SAMPLE a 
LATITUDE 
LONGTUDE 
HORIZONS 
HORIZON 
URANIUM 
ARSENIC 

BORON 

BARIUM 
MOLYBDEN 
SELENIUM 
VANADIUM 
SULFATE 
ToTALK M 


Lie 


< 


R 


Graph of Mean Uranium Levels 


Along with numeric output, S 


Dependent(s): 


URANLOG — 


Y STAT produces a Quick Graph: a line-connected plot 


of mean uranium levels and confidence intervals for the different producing horizons. 
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Most of SYSTAT’s statistical procedures have associated Quick Graphs. Quick 
Graphs speed up analysis by providing immediate visual feedback on results. In this 
Quick Graph, it is easily seen that the third group, Quartermaster, has a much higher 
level of uranium. 
Output for ANOVA 


The numeric output of the ANOVA appears in the Output Editor. 


Analysis of Variance 


Source | Type III ss df Mean Squares F-ratio p-value 
HORIZON | 14.978 4 3.744 3.252 0.014 
Error i 140.484 122 1.152 


In the Analysis of Variance table, the F test has a p-value of 0.014, meaning that there 
is only a 1.4% chance that these data would be measured if the individual producing 
horizons have the same average level of uranium—that is, the uranium level differs 
significantly by producing horizon. We saw this immediately in the Quick Graph. In 
fact, in the Quick Graph we also saw that producing horizon 3, the Quartermaster 
horizon, differs the most. 
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Outliers and Diagnostics 


The Output Editor also has warnings about outliers. 
+e WARNING *** ; 


Case 30 is an Outlier (Studentized Residual : -4.732) 
Case 31 is an Outlier (Studentized Residual : -4.732) 


Test for Normality 


Durbin-Watson D Statistic i 1. 
First Order Autocorrelation | 0.345 


There are two outliers in the data: cases 30 and 31. These are the same two that we 
lassoed earlier in the probability plot. 


SYSTAT performs diagnostics to verify that the data meet the underlying assumptions 
for ANOVA, Linear Regression, and General Linear Models (GLM). Diagnostics 
speed up the analysis and help to produce more accurate results by alerting you to 
problems with the data. Both the Durbin-Watson D statistic and the first-order 
autocorrelation appear by default and these are parts of such diagnostics. 


The Options tab provided in the ANOVA dialog box performs diagnostics. The 
Shapiro-Wilk option performs the test for normality of residuals. From the above 
output of Zest for Normality, the p-value is an indication (as in any hypothesis testing 
results) of whether the hypothesis being tested (in this case the normality of the 
residuals) is to be accepted or rejected. The smaller the p-value the stronger is the 
evidence against the hypothesis. Since in this case the value is near 0 (0 up to 3 places 
of decimal) the normality hypothesis of residuals is rejected. When the assumption of 
normal residuals cannot be justified even for a transformed variable, we may consider 
nonparametric methods, which do not depend on such assumptions. 


Nonparametric tests 


Now we see how the question earlier answered by using ANOVA (with normality 
assumption on residuals) can be answered by a nonparametric test, which does not 
make this assumption. Now you might ask: Why then bother with ANOVA at all? The 
answer is: If the normality assumption actually holds, then ANOVA is a more powerful 
method, but it is not valid when the assumption fails. If we do not have a good 
distribution model for URANLOG or a transformed variable, then it is safer to use a 
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distribution-free (nonparametric) method, even if it is not powerful. For a 
nonparametric test for the equality of URANLOG levels at various horizons: 


From the menus choose: 


Analyze 
Nonparametric Tests 
Kruskal-Wallis... 


m Select URANLOG as the Selected variable(s) and HORIZON as the Grouping 
variable. 


elected variable{s}: 
SAMPLE URANLOG 
LATITUDE : 

LONGTUDE 

HORIZONS 


l 
| 
4 
i 
f 
f 
| 


== 
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Output from Kruskal-Wallis Test 
Kruskal-Wallis One-way Analysis of Variance for 127 Cases 


Categorical values encountered during processing are 


Variables i Levels 

phare ERE EE e E R EE EE eee ee 
HORIZON (5 levels) | 1.000 2.000 3.000 4.000 5.000 
Dependent Variable {| URANLOG 

Grouping Variable | HORIZON 


1 43 2851.500 
2 18 986.000 
3 21 1880.500 
4 29 1455.000 
5 16 955.000 


Kruskal-Wallis Test Statistic : 15.731 
p-value is 0.003 assuming Chi-square Distribution with 4 df 


From the Kruskal-Wallis One-way Analysis of Variance table, the chi-square test has 
a p-value 0.003, meaning that there is only 0.3% chance that these data would show 
this much difference between the groups if the individual producing horizons have the 
same average level of uranium. Thus we conclude that the uranium level differs 
significantly for producing horizons. We arrived at the same qualitative conclusion 
from ANOVA and its Quick Graph, but it was quantitatively different. The p-value in 
ANOVA was 0.014; here it is 0.003. 


Advanced Graphics 


This part of the tour explores SYSTAT’s advanced graphics capabilities, including 3-D 
rotation, animation, zooming using the Dynamic Explorer, smoothers, contour plots, 
and Page view. (The graphics in this section are best viewed in 16-bit or 32-bit true 
color on a high-resolution monitor.) 

From the preceding statistical analysis, we can conclude that there are differences 
in the uranium level between the producing horizons. However, we also have the 
latitude and longitude for each sample, so we can perform a geographic analysis to 
better pinpoint the variations in uranium. To accomplish this, we will apply a 
smoothing technique called “kriging” (pronounced kree-ging) to fit a 3-D scatterplot 
of uranium by latitude and longitude. Kriging is a smoothing technique often used in 
geostatistics. It uses local information around points to extrapolate complex and 


irregular geographic patterns. 
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Kriging Smoother 


From the menus, submit the file GDWTR2DM. 


m From the menus choose: 
File 
Submit 
File... 


m Select the file GDWTR2DM from the ‘Miscellaneous’ subfolder of the ‘command’ 
directory and click Open. 


The following graph is displayed in the Output Editor: 


Actual Uranium and Kriging Smoother by Geography 


Uranium 


This plot shows the level of uranium against latitude and longitude (the data points) 
and the kriging smoother (the surface). The plot provides us with a topography of the 


uranium level, and we can see immediately that there is a pronounced peak near the 
center of the sampling area. 


Rotation 


If you look at the Dynamic Explorer, the rotation arrows have been activated. The 
rotation arrows can be used interactively to rotate the plot in three dimensions, 
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allowing you to examine your data from all angles. Try pressing each of the four 
rotation keys to examine how the plot changes. 
Notable features include: 


m True graphical rotation with automatic recalculation of the graph upon each 
rotation. (SYSTAT does not just rotate a picture or bitmap, it physically transforms 
the graph data and replots the graph and all of its elements in real time with each 
rotation.) 


Realistic 3-D lighting to increase the volume effect. 

Notable 3-D fonts on each axis that rotate along with the graph. 
The ability to view from all angles, including above and below. 
Closer data points look larger and more distant points look smaller. 


Smoothers 


SYSTAT offers 126 nonparametric smoothers for exploratory analysis. In addition, 
nineteen smoothers can be directly added to graphical output. The smoothing options 
available for scatterplots are: 


None LOWESS Inverse Andrews 
Linear DWLS Mean Bisquare 
Quadratic Spline Median Huber 
Log Step Mode Trimmed 
Power NEXPO Midrange _Kriging 


Smoothers help you view your data in unique and informative ways. In this case, we 
are using kriging because it is especially designed for examining spatial distributions 
such as mineral deposits. 


Tension of Smoothers 


Each smoother has a tension associated with it. If you consider the smoother to be a 
string or membrane loosely attached to each data point, then the higher the tension on 
the ends of the string, the less influence any individual point has and the smoother 
averages across them all. The lower the tension on the ends of the string, the greater 
the influence of the individual data points, and the smoother approaches a path that 
passes through each point. 
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In addition to rotation, with the help of Graph Properties dialog box, you can also 
alter the tension of the kriging smoother. 


m To open the Graph Properties dialog box, right-click on the graph editor and select 
Properties. 


Click the Graph tab in the Graph Properties dialog box. 


Use the up arrow key in the keyboard to select the graph as "Actual Uranium and 
Kriging Smoother by Geography". 


m Now, click on the Element tab and select the Smoother tab. 
Select Kriging from the Method combo box. 


Use the down arrow key to change the tension value from 0.35 to 0.90 in Tension 
combo box. 


Actual Uranium and Kriging Smoother by Geography 


Uranium 


D 
$ 
K iid cg i 
% "sigh? 
“a, S anI gw 


Notice how the surface becomes flatter and lower -- recall from the histogram that most 
samples have a low value for the uranium level. Decrease the tension from 0.90 to 0.10. 
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Actual Uranium and Kriging Smoother by Geography 


Uranium 


Notice how the surface reaches out to each individual point. 


Page View 


If at this point you switch to the Page view by selecting from the menu, 
View 

Page View... 
You can see that you have the capabilities from the Dynamic Explorer (rotation, 
animation, and zoom) available in Page as in Graph View. In addition, you can position 
the chart by dragging it around on the page. 
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Contour Plot of the Kriging Smoother 


So far we have looked at this data by producing horizon and by latitude and longitude. 
SYSTAT allows us to combine these two pieces of information by tailoring and 
coloring symbols. As a final analysis, we will use another advanced graphing 
technique: a contour plot of the kriging smoother. This final plot consists of successive 
vertical slices through the surface of the kriging smoother overlaid on the data coded 
by producing horizon. From the menus, submit the file GDWTR3DM. 
m From the menus choose: 
File 

Submit 

File... 


m Select GDWTR3DM from the ‘Miscellaneous’ subfolder of the ‘command’ 
directory and click Open. 


The following graph is displayed: 
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Actual Uranium and Kriging Smoother by Geography 
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The plot is simply a different view of the 3-D plot, but now we can use the contours to 
pinpoint the high levels of uranium with respect to the producing horizons. The peaks 
of the kriging smoother are represented by tighter, brighter yellow and red contours, 
while the valleys are represented by dashed blue and green contours. The actual data 
points are distinguished in color and symbol by producing horizon. Notice how the 
peak is in the middle of the Quartermaster group; this is why it had the highest value 
in the earlier ANOVA. We can also see that the uranium level is not uniformly higher 
throughout this producing horizon but is highly localized. 


Advanced Statistics 


The kriging smoother provided a quick geographic visualization of uranium 
concentrations. SY STAT also provides a comprehensive spatial statistics procedure for 
analyzing and modeling geographic data. You can create variograms and perform 


stochastic simulation or kriging. 
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% Advanced: Spatial Statistics 


Model | Variogram|) Options} Krsins {Grid | Resampling) 


Available variable{s}: Dependent: 
SAMPLE i ‘URANLOG 
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Summary 


At this point, we have made some significant discoveries about the groundwater data: 
we know exactly where the uranium is geographically concentrated both in terms of 
producing horizon and latitude and longitude. We also have some very high-quality 
graphics to communicate our findings in print or in a presentation. SYSTAT has taken 
us from data to discovery. 

By the way, this groundwater application has many other areas to explore other than 
the few that we have examined in this tour. For example, we have not even looked at 
the relationships between uranium and the other elements in the data set. You are 
encouraged to explore the power of SYSTAT further through this application, 
beginning with any of the other potential analyses mentioned earlier. 
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Alternatively, examine any of the other 16 applications provided with SYSTAT. You 
can access them either through the Application Gallery in the Help system Table of 
Contents or through the chapter “Applications” on p. 283 in the Getting Started 


manual. 


References for Groundwater Data 


The groundwater data used in these examples were obtained from the following 


sources: 


Original Source. Nichols, C. E., Kane, V. E., Browning, M. T., and Cagle, G. W. (1976). 


National Uranium Resource Evaluation, Northwest Texas Pilot Geochemical Survey, 
Union Carbide Corporation, Nuclear Division, Oak Ridge Gaseous Diffusion Plant, Oak 
Ridge, Tenn., K/UR-1, U.S. Department of Energy, Grand Junction, Colo., GJBX- 
60(76), 231. 

Data Reference. Andrews, 
Problems from Many Fie 
Verlag, New York. 


D. F. and Herzberg, A. M. (1985). Data: A Collection of 
lds for the Student and Research Worker, 123-126. Springer- 
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Command Language 


(Revised by Rajashree Kamath) 


Most SYSTAT commands are accessible from the menus and dialog boxes. When you 
make selections, SYSTAT generates the corresponding commands. Some users, 
however, may prefer to bypass the menus and type the commands directly at the 
command prompt. This is particularly useful because some options are available only 
by using commands, not by selecting from menus or dialog boxes. Whenever you run 
an analysis--whether you use the menus or type the commands--SYSTAT stores the 
processed commands in the command log. 

A command file is simply a text file that contains SYSTAT commands. Saving 
your analysis in a command file allows you to repeat it at a later date. Many 
government agencies, for example, require that command files be submitted with 
reports that contain computer-generated results. SYSTAT provides you with a 
command file editor in its Commandspace. 

You can also create command templates. A template allows customized, repeatable 
analyses by allowing the user to specify characteristics of the analysis as SYSTAT 
processes the commands. For example, you can select the data file and variables to 
use on each submission of the template. This flexibility makes templates particularly 
useful for analyses that you perform often on different data files, or for combining 


analytical procedures and graphs. 
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Commandspace 


Some of the functionality provided by SYSTAT's command language may not be 
available in the dialog box interface. Moreover, using the command language enables 
you to save sets of commands you use on a routine basis.. 


Commands are run in the Commandspace of the SYSTAT window. The 
Commandspace has three tabs, each of which allows you to access a different 
functionality of the command language. 


[interactive Lg | ied | 


Interactive tab. Selecting the Interactive tab enables you to enter the commands in the 
interactive mode. Type commands at the command prompt (>) and issue them by 
pressing the Enter key. You can save the contents of the tab (SYSTAT excludes the 
prompt), and then use the file as a batch file. 


Batch (Untitled) tab. Selecting the Untitled tab enables you to operate in batch mode. 
You can open any number of existing command files, and edit or submit any of these 
files. You can also type an entire set of commands and submit the content of the tab or 
portions of it. This tab is labeled Untitled until its content is saved. The name that you 
specify while saving the content replaces the caption ‘Untitled’ on the tab. 


Log tab. Selecting the Log tab enables you to examine the read-only log of the 
commands that you have run during your session. You can save the command log or 
even submit all or part of it. 


When the Commandspace is active, you can cycle through the three tabs using the 
following keyboard shortcuts: 

m CTRL+ALT+TAB. Shifts focus one tab to the right, 

m CTRL+ALT+SHIFT+TAB. Shifts focus one tab to the left. 
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What Do Commands Look Like? 
Here are some examples of SYSTAT commands: 
XTAB 1 
USE food 2 
PLENGTH NONE/ LIST 3 
TAB food$ brand$ diet$ 4 
CSTAT 5 
BY diet$ 6 
CBSTAT / MEDIAN MIN MAX MEAN CI 7 
BY 8 
CORR 9 
PEARSON calories fat protein cost / BONF 10 
SPLOM calories fat protein cost 11 
PLOT calories * protein / LABEL=brand$ 12 


The CSTAT command on line 5 produces a set of descriptive statistics for all seven 
numeric variables in the FOOD data file. Line 7 asks for the median, minimum, 
maximum, means, and confidence intervals for all of the variables. 


SY STAT commands are made up of keywords meaningful to the function that they 
perform on execution. As far as possible, all meaningful words associated with a given 
function are applicable. For example, CSTATISTICS, CSTATS, and STATISTICS will 
all give you descriptive statistics. Likewise, PLENGTH or DISPLAY will both allow you 
to specify the length of output produced by a given command. A keyword will typically 
be made of letters of the alphabet, and sometimes numbers. All other characters like 
the hyphen and underscore are avoided; a space and some other characters like the plus 
(+), minus(-), asterisk (*), hash (#) and exclamation mark (!), are not used as they may 
be used in other parts of a command. 


Interactive Command Entry 


Commands can be issued automatically when the Interactive tab is selected in the 
Commandspace. To issue a command, type the command and press the Enter key. 
SYSTAT’s commands can be categorized into four broad categories: general 
commands, data related commands, graph related commands, and statistical 
commands. The statistical commands are in turn grouped by module. While the other 
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commands are available for use at any time, the statistical commands will only 
function after you enter or, in other words, "load" the relevant module. The modules 
are as follows: 
ANOVA BETACORR BAYESIAN 
CLUSTER CONJOINT CORAN CORR 
DESIGN DISCRIM FACTOR FITDIST 
GAUGE GLM IIDMC LOGIT 
LOGLIN MANOVA MCMC 
MDS MISSING MIX MIXED 
MSIGMA NONLIN NPAR PERMAP 
PLS POSAC POWER PROBIT 
ac RAMONA RANDSAMP RANKREG 
RDISCRIM REGRESS RIDGE ROBREG 
RSM SAVINGS SERIES SETCOR 
SIGNAL SMOOTH SPATIAL 
SURVIVAL TESTAT TESTING TLOSS 
TREES TSLS ve XTAB 
Note: 
1. There are three other modules in SYSTAT that are not listed above, viz. BASIC, 
MATRIX and STATS. Commands related to these modules will work directly 
without having to load the modules. In other words, they function just like the 
general commands. 
2. Some of these modules are available only as add-ons. 
m To enter a module, type its name after the prompt, and press the Enter key. For 
example, type: 
XTAB 
m Next, identify which data to use. For example, type 
USE ourworld 
and press the Enter key. 


m Now type a command line: 


TABULATE leader$ group/MEAN= pop_1983 
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XÍ XTAB š ae i 
>USE Ourvorld.syz | 
|>TABULATE LEADER$ GROUP /MEAN= POP_1983 | 

| 


Interactive [tog Unisea] 


m Press the Enter key to obtain the output. 


To create graphs, type the desired graph command followed by the variables to use. 
Specify optional settings to customize the resulting display. Valid graph commands 


include: 
BAR DENSITY DOT DRAW FOURIER 
FPLOT ICON LINE MAP PARALLEL 
PIE PLOT PPLOT PROFILE PYRAMID 
QPLOT SPLOM WRITE 


Refer the Language Reference volume for details regarding general and data related 
commands. 


Command Syntax 


Most SYSTAT commands have three parts: a command, an argument(s), and options. 


command argument / options 


Each module name or command must start on a new line. A command must be 
separated from its argument by a space (the equal sign is not allowed) and options are 
separated from commands by a slash (/). For example: 


CSTATISTICS urban babymort / MEAN SEM MEDIAN 


m The command specifies the task--in this case, to display statistics. 


The arguments are the names of the variables, URBAN and BABYMORT, for which 


statistics will be computed. 


m The options (following the slash) specify which statistics you want to see. If you 
do not specify any options, SYSTAT displays a default set of statistics. 
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In general, the argument may be one or more variables, numbers or strings separated 
by a space or comma, variable lists separated by the asterisk (*), file names, folder 
names, a specific keyword that may or may not be equated to a number, an expression, 
an equation or an inequality. Each option is a keyword that may or may not be equated 
to an option value (the equal sign is compulsory). The option value has the same 
possibilities as the argument. 


Hot versus Cold Commands 


Some commands execute a task immediately, while others do not. We call these hot and 
cold commands, respectively. 


Hot commands. These commands initiate immediate action. For example, if you type 
LIST and press the Enter key, SYSTAT lists cases for all variables in the current data 
file. 


Cold commands. These commands set formats or specify conditions. For example, 
PAGE WIDE specifies the format for subsequent output, but output is not actually 
produced until you issue further commands. Similarly, the SAVE command in modules 
specifies the file to save results and data to, but does not in itself trigger the saving of 
results; the next HOT command does that. 


Command Syntax Rules 


Upper or lower case. Commands are not case sensitive. You can type commands in 
upper or lower case or both: 


CSTATISTICS or cbstatistics or CbStatistics 


The only time SYSTAT distinguishes between upper and lower case is in the values of 
string variables. In other words, for a variable named SEX$, SYSTAT considers the text 
values “male” and “MALE” to be different. 


Abbreviating commands. You can shorten commands and options to the first two to 
seven letters, as long as the resulting abbreviation is unique and the largest expansion 
sounds “nice” (commonly used). For e.g., COV, COVA, and COVAR, will all be 
permissible abbreviations of COVARIANCE. For commands, abbreviations till the full 
word (even beyond 7 characters) will be supported. For example: 


Œ CSTATISTICS can be shortened as CSTA or CST or CS. 
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@ DENSITY var can be shortened as DEN var. 
m HELP phrase can be shortened as HE phrase. 


In the case of commands within a module, the abbreviation needs to be unique within 
the module. For example, STAR, STAN, STE and STO will be interpreted as START, 
STANDARDIZE, STEP and STOP respectively within the GLM module. Outside GLM, 
STAN will be treated as STANDARDIZE - the command to standardize variables. 


Note: BASIC commands, module and variable names must be typed in full; they 
cannot be abbreviated. 


Interpreting common commands. Some commands like STANDARDIZE perform 
different functions within and outside modules. Such commands will be interpreted 
based on a certain priority order: BASIC commands, commands related to the module 
currently loaded if any, and then the rest of the commands. If you want to use a global 
command - a command that is globally available irrespective of the module loaded - 
when a module is loaded, then you have to issue EXIT to exit from the module. 


Retrieving commands. SYSTAT holds the most recently processed command lines in 
memory. From the Interactive tab of the Commandspace, use the Up arrow or F9 key 
to scroll through the commands. Press Up arrow or F9 once to recall the previous 
command, press it again to see the command before that, and so on. Use the General 
tab of the Global Options dialog to define the number and source of commands to 
retain in memory. 

Continuing long commands onto a second line. To continue a command onto another 
line, type a comma at the end of the line. For example, typing 


CSTAT urban babymort pop_1990 / MEAN SEM MEDIAN 


is the same as: 


CSTAT urban babymort, 
pop_1990 / MEAN SEM, 


MEDIAN 
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Do not use a comma at the end of the last line of a command; this will cause SYSTAT 
to wait for the rest of the command. Also one word cannot be typed into two lines for 
example: 


USE OUR, 
WORLD 

or 

US, 

E OURWORLD 


are invalid shortcuts, whereas the following is a valid one: 


USE, 
OURWORLD 


Commas and spaces. Except when used to continue a command from one line to the 
next, and in the case of functions, commas and spaces are interchangeable as 
delimiters. For example, the following are equivalent: 


CSTAT urban babymort pop_1990 
CSTAT urban, babymort, pop_1990 
CSTAT urban,babymort, pop_1990 


Quotation marks. You must put quotation marks around any character (string) data 
that belongs to a string variable, a string that needs to be case sensitive, or contains 
spaces. 


For example, type: 


NOTE ‘Statistical Analysis' 


to display a note in the output in title case and on a single line. If your data file has a 
string variable for country names written in title case, the following command will 
select the case corresponding to Sweden: 


SELECT country$ = ‘Sweden’ 


You can use either double (" ") or single ('') quotes. If you are using dialogs to generate 
commands involving string variables, you need not specify quotation marks. 

In certain commands that involve values taken by string variables, if you do not use 
quotes around a value, SYSTAT looks for the value written in uppercase. For 
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example, SPECIFY gov$[Democracy] + urban${[city] will be interpreted as SPECIFY 
gov$[DEMOCRACY] + urban$[CITY] whereas SPECIFY gov$['Democracy’] + 
urban$['city'] will be interpreted as SPECIFY gov$[Democracy] + urban$[city]. 


Furthermore, for any command involving filenames (such as USE and SAVE), 
filenames and file paths containing spaces require quotation marks around them. 


Specifying matrices. Some commands and options accept matrices as their arguments. 
These have to be enclosed in [square] brackets, and end of rows have to be indicated 
by semicolons (except the last row). 


SYSTAT functions. A typical SYSTAT function has the syntax FUN(par1, par2, ...) 
where par1, par2, ... are the parameters of the function FUN. When the number of 
parameters is more than one, the parameters have to be compulsorily separated by 
commas (a space cannot be used as a delimiter). The parameters are optional for many 
functions (default values will be used) in which case the function has to be written as 
FUN(). For instance, ZRN() will generate random numbers from the standard normal 


distribution. 

Unit of measurement. Certain commands and options related to graphs allow you to 
specify the unit of measurement. The available units of measurement are inches, 
centimeters and points that can be indicated using the keywords IN, CM and PT 
respectively. When used in the arguments of commands, you should separate the 
number by a space compulsorily. For example, DEPTH 2 CM sets the depth of a graph 
to 2 centimeters. In the case of option values, a number can be suffixed by the unit of 
measurement with or without a space. For example, the option HEIGHT = 200PT sets 
the height of a graph to 200 points. 


Shortcuts 
There are some shortcuts you can use when typing commands. 
Listing consecutive variables. When you want to specify more than two variables that 


are consecutive in the data file, you can type the first and last variable and separate 
them with two periods (..) instead of typing the entire list. This shortcut will be referred 


to as the ellipsis. For example, instead of typing 


CSTAT babymort life_exp gnp_82 gnp_86 gdp_cap 


180 


Chapter 5 


you can type: 


CSTAT babymort .. gdp_cap 


You can type combinations of variable names and lists of consecutive variables using 
the ellipsis, 


Multiple transformations: the @ sign. When you want to perform the same 
transformation on several variables, you can use the @ sign instead of typing a separate 
line for each transformation. For example, 


LET gdp_cap = L10(gdp_cap) 

LET mil = L10(mil) 

LET gnp_86 = L10(gnp_86) 

is the same as: 

LET (gdp_cap, mil, gnp_86) = L10(@) 


The @ sign acts as a placeholder for the variable names. The variable names must be 
separated by commas and enclosed within parentheses ( ). 


Online Help for Commands 


SYSTAT’s online help system provides easy access to information about SYSTAT 
commands. At the command prompt, type HELP followed by the name of a module or 
command for which you want help. For example, you can access help on the CORR 
module by typing: 


HELP CORR 


If you are already in the CORR module, you can type just HELP to get a list of 
commands available within CORR, HELP followed by the name of a command that 
you know belongs to the CORR module (for example, HELP PEARSON) or HELP 
followed by the name of any other module or global command (for example, HELP 
CLUSTER). 

You can also start help by choosing Index from the Help menu and selecting the 
desired command from the list. Yet another alternative is to type the command in any 
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tab of the Commandspace, and either clicking on it and pressing Ctrl+F 1, or right- 
clicking on it and selecting the HELP command. 


Autocomplete commands 


As you begin typing commands in the Interactive or batch (Untitled) tab of the 
Commandspace, you will be prompted with the possible command keywords, 
available data files, or available variables. When a letter is typed, all commands 
beginning with that letter will appear in a dropdown list. Select the desired command 
or continue typing. On pressing space and then any letter, for the USE and VIEW 
commands, the data files in the SYSTAT Data folder, or the folder specified under 
Open data in the Edit: Options dialog, will be listed. For any other command, if a data 
file is open, all available variable names beginning with that letter will appear in a drop 
down list. 

Command autocompletion is enabled by default. You can turn it off by unchecking 
Autocomplete commands in the Edit: Options dialog. 


Command Coloring 


The commands, variable names, numbers, strings and comments (REM statements) 
that you type will be colored in distinguishing colors. The colors are as follows: 


Commands Blue 
Variable names Violet 
Numbers Pink 
Strings in double quotes Pink 
Comments Green 


Coloring makes it easy for you to identify the various components of a command 
line thereby reducing the risk of making syntax errors. 
Command Files 


A command file is a text file that contains SYSTAT commands. Saving your analyses 
in a command file allows you to repeat them at a later date. 
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You can create a command file by selecting the batch (Untitled) tab in the 
Commandspace. This tab corresponds to a simple text editor; type the desired 
commands line by line. When you are done, save the commands to a file or submit 
them to SYSTAT for processing. In contrast to the Interactive tab, no interactive 
prompt (>) appears on the batch tab; commands are not processed until the resulting 
command file is submitted to SYSTAT. 


USE ourworld 
TABULATE Jeader$ group / MEAN = pop_1993 


XTAB 
USE OURWORLD 
TABULATE leader$ group/MEAN = pop_1983 


If you find any of the SYSTAT examples relevant to your analysis, you can open this 
example command file in the SYSTAT Command folder, edit it to suit your data and 

save it under a different filename. You can in fact simultaneously create or open any 

number of command files, copy/paste among them, edit any of them, and submit any 
of them. 


To create a new command file 


= From the menus, choose 
File 
New 
Command... 


or 


Click the batch (Untitled) tab and press the New toolbar button on the Standard toolbar. 


m Type SYSTAT commands in the batch (Untitled) tab. For more information on 
SYSTAT commands, see SYSTAT Language reference. 
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Buse ourworia 
STAT pop_1963, urban, health, babymort 


Interactive |Log | untied |Unttledt.sye | Untied? sre 4 URE 


USE OURWORLD 
CSTAT pop_1983, urban, health, babymort 
= To save the command file, with the corresponding tab active, from the menus 
choose: 


File 
Save Active File... 


Alternatively, click Command under Save. 
m Inthe Save in field, select the appropriate drive or folder to save to. 


m Type a suitable filename or select an existing file from the list if you want to 
overwrite. Press Save. 
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Save As 


SYC Files (*.syc} 


Note: 


m To save a file under a different name, click Save As... and specify the desired 
filename and path. 

m Ifyou want to save the command file in ANSI format, select "SYC Files (ANSI) 
(*.syc)" in the Files of type field. Select "All files" if you want to use a different 
file extension. 

To save all unsaved files, click Save All and specify appropriate filenames for each. 


Instead of typing commands, you can perform the corresponding actions through 
menus and dialogs, and select Save or Save as with the Log tab active. 

m The commands that you type line-by-line in the Interactive tab can also be saved to 
a command file, by selecting Save or Save as with the Interactive tab active. 
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To open a command file 
= From the menus, choose 
File 
Open 
Command 


or 


Click the batch (Untitled) tab and press the Open toolbar button on the Standard 

toolbar. 

m Inthe Look in field, click the drive or folder that contains the command file you 
want to open. 

m Click the command file name from the list that is displayed, and press Open. 


Open SYSTAT Command File 


MS Lockie | © Command El osem 
z paoearare 
© | È Getting_Started 
My Recent | È GraphDemo 
Documents |) Graphics 
| \amemc 
Desktop | |ENOC 
j [E Statistics _1I 
e | (GQ Statistics _11 
EN (E Statistics _111 
MD H E Statistics IV 
My Computer | Pen ee — -— 
File name: Ea tics | ~ 


[ae es 
| Fiesolype: [SYC Fies (9) x] 


Lai «E Open as read-only $ 


Note: 
m Ifyou do not see the command file you are looking for, you can choose a different 
file type in the Files of type field. 
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m You can also open a command file you used recently by clicking its name in the 
Recent Files quadrant of the Startpage or in the File menu. 


Working with Text 
m To undo your last action, press Ctrl+Z. 


m To search for text, press Ctrl+F. In the Find what field, enter the text you want to 
search for, and then press Find Next. To find additional instances of the same text, 
continue to press Find Next. You can search for whole words alone, do a 
case-sensitive search, or search backwards. 


m You can also search and replace text by pressing Ctrl+H. 


Printing Command Files 
m Open the command file in an alternative command editor like Notepad. 
m Use the print option to print the command file. 


Submitting Command Files 


When you submit a command file, SYSTAT executes the commands as if they were 
typed line by line at the command prompt. For example, suppose you have a text file 
of SYSTAT commands named TUTORIAL.SYC. You can execute the commands in the 
file in eight different ways: 


m Issue a SUBMIT command from any SYSTAT procedure: 


SUBMIT tutorial 


Note: Unless the command file is in the default directory, for commands in the File 
Locations tab of the Global Options, you have to define the path for the file. For 
information on Global Options, see Chapter 7, Customization of the SYSTAT 
Environment. 
m Inthe SYSTAT window, from the menus choose: 
File 

Submit 

File... 


m Open the command file in the batch (Untitled) tab in the Commandspace using the 
File or context menu. From the Submit sub-menu of the File menu, you can then 
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submit the entire file (Window), submit from the cursor’s location till the end of the 
file (From Current Line to End), or submit just the current line (Current Line). 

m From the menus choose: 

Utilities 

User Menu 
Menu List... 

and click on the item from the list. For information on creating menu items in the User 

Menu, refer Chapter 7, Customization of the SYSTAT Environment. 

= Double-click the file after navigating to its location in the hard disk through 
Windows Explorer. The file opens in a new instance of the SYSTAT application. 
Right-click in the batch (Untitled) tab of the Commandspace and submit the file. 


m Use the DOS command syntax to (open or) submit the file. The details of this 
method are explained later in this chapter. 

m Create a link to the command file in the Examples tab of the Workspace using the 
Add Examples dialog that opens on clicking Add Examples under the Utilities 
menu, Double-click the link, or right-click and select Run to execute the command 
file as it is. You can even use the context menu to open the command file in the 
batch tab, edit it and then execute it. Refer Chapter 7, Customization of the SYSTAT 
Environment to know more about adding examples. 

m Open the command file in any external application like Notepad, copy some or all 
commands, right-click anywhere in the Commandspace, and select Submit 
Clipboard. 

To submit a range of commands, select the commands and choose Submit Selection 

from the context menu. If the range includes the last command in the tab, use Submit 

From Current Line to End. If you choose either Submit Window or Submit From 

Current Line to End, SYSTAT prompts you to specify whether to submit the range or 

not. 


Alternative Command Editors 


Command files are ASCII text files having an SYC filename extension and containing 
command syntax. Hence, you can use any text editor to create command files. In your 
editor, type each command on a new line and save the resulting file as ASCII text. We 
recommend using the SYC extension when saving these files. Although any text file 

containing commands can be processed, using an SYC extension for these files allows 
maximal Windows functionality, such as double-clicking a file to automatically open 
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it. In addition, you can use a text editor in conjunction with the Windows Clipboard to 
submit syntax for processing without creating command files or using the 
Commandspace. After typing the commands in your editor, select and copy them. In 
the processing environment, select Submit Clipboard from the File menu. The software 
processes the commands without changing any text on the Interactive or middle tabs 
of the Commandspace. 


Using a text editor for command entry allows you to hide the Commandspace, creating 
more area in which to display the output. As you change between the editing and 
processing environments, the currently active application appears in front of the other. 
Consequently, you can maximize the area for both the input and the output, switching 
between the two by toggling between the applications. You can also have multiple 
command files open, submitting commands from each of them using the 'Copy/Submit 
Clipboard’ procedure. However, the Clipboard only accesses the last copied item. Be 
sure the most recently copied text corresponds to the commands to be submitted. 


Because the Commandspace itself is a text editor, you can also copy commands from 
any of the three tabs for subsequent submission via the Clipboard. However, other 
submission methods (Submit Window, Submit from Current Line to End, Submit 
Current Line and Submit Selection) offer the same functionality without replacing the 
contents of the Clipboard. Moreover, the command prompt (>) prevents successful 
submission of two or more command lines copied from the Interactive tab. 


Comments in Command Files 


The !! or REM command can be used for inserting comments in command files and for 
making a command inactive during the current run. All text following !! or REM on the 
same line is ignored. 


REM Now we merge files side-by-side 
REM MERGE filel file2 
MERGE filel file3 


The text following the first REM command remains in the command file. The MERGE 
statement in the second line is not invoked. 


The !! command can also be used at the end of another command line. You can use this 
to append comments to a command line. The comments could indicate what the 
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command line does, why it was written, which step of a procedure it is, or even the 
name of the person who has written it. 


Tip: To add comments that appear in your output, use the NOTE command. 


Commands to Control Output 


SYSTAT provides a number of commands to save and print output, as well as to control 
its appearance. These commands may be particularly useful when creating command 


files. 


OUTPUT command. Enables you to route subsequent plain text output to a file or a 
printer. 


PAGE command. Enables you to specify a narrow (80 columns, the default) or wide 
format (132 columns) for output. You can also specify a title that appears at the top of 
each printed output page. 


FORMAT command. Enables you to specify the number of character spaces per field 
displayed in data listings and matrix layouts, and the number of digits printed to the 
right of the decimal point. You can also display very small numbers in exponential 
notation (instead of being rounded to 0). 


NOTE command. Enables you to add comments to your output. For example: 


NOTE "THIS IS A COMMENT. Si 
"This is the second line of comments." 
"It's the 'third' line here!" 


Each character string enclosed in either single or double quotation marks is printed on 
a separate line. A note can span any number of lines, and you can also specify an ASCII 
code to display the corresponding ASCII character. 


Command Log 


SYSTAT records the commands you specify during your current session in a temporary 
file called the command log. Select the Log tab in the Commandspace to view the 
command log. You can view, copy, submit, and save all of the commands stored in the 
command log at any time during a session. However, because the log serves as a 
command recorder, you cannot edit commands using the Log tab. 
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[USE "\\pelican\ karma) InstaliFolder\Data\ Ourworid.syz" 
REN -- Following commands were produced BY the CSTAT DIALOG: 
CSTATISTICS POP_1983 BABYMORT HEALTH / N MIN MAX MEAN SD 
REM -- END of commands from the CSTAT DIALOG 


x 


Interactive |Log | Untitled | 3 


After selecting the Log tab, you can submit commands directly from the command log 
in four ways: 

m Submit the entire log by choosing Submit Window from the File or context menus. 
m Submit the most recently processed commands by moving the cursor to the desired 
starting point and choosing Submit From Current Line to End from the File or 

context menus. 
m Submit a subset of commands by selecting the desired commands and choosing 
Submit Selection from the context menus. 


m Submit the desired line by moving the cursor to the line and choosing Submit 
Current Line from the context menus. 


To modify commands before submission, copy the log contents, paste the copied 
portion to the batch (Untitled) tab or Interactive tab, edit the pasted commands, and 
submit the resulting syntax. 


Autorecovery 


The command log records only the commands from your current session. You cannot 
use the command log to recover commands from a previous session unless you saved 
those commands in a command file before exiting SYSTAT. However, SY STAT saves 
the log file of the session in case a fatal error occurs. You can specify the path where 
you want to save the file. To specify the path: 


m From the menus choose: 


Edit 
Options... 


m Click the File Locations tab. Type or select the file path corresponding to the Work 
data field. 


m Close the current SYSTAT session to activate the specified path. 
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SYSTAT chooses a default name for the file Autosave0.syc. Ifa command file with the 
name is already there, it chooses Autosavel.syc as the name for the command file and 
so on. SYSTAT deletes the file in case the user quits the session, and the file remains 
there only when a fatal error occurs. 


Macros in SYSTAT: Record Script 


SYSTAT provides you an option to reuse a part or whole of the log file of the current 
session. To start/stop recording the scripts: 


m From the menus choose: 


Utilities 
Start/Stop Recording... 


or 
m Click on the Record Script toolbar button. 


The Record Script dialog pops up when you stop the recording. 


You can save the recorded script to a file and/or you can add it to the User Menu for 
use in subsequent sessions. For more information on the User Menu, see Chapter 7, 
Customization of the SYSTAT Environment. Quit the dialog by pressing Cancel if you 
do not want to save the recorded script. 

There is also another way to reuse the recorded commands: 


m From the menus choose: 
Utilities 
Macro 
Play Recording... 


or 
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m Click on the Play Recording toolbar button. P] 


Note: The Play Recording option can only play the latest recording. So, a recording 
will be lost if you start recording another set of commands without saving it. 


Working with DOS Commands 


Some of the tasks that SYSTAT is capable of can be performed with minimum user 
intervention. For instance, there may be very large command files you want to execute, 
or command files that require a long time to produce output, or command files that 
produce a large number of graphs all of which you want to save. It is indeed possible 
to do all this and much more in the Windows environment. In fact, you can work with 
SYSTAT command files even without having to open the SYSTAT application 
manually. All you need to do is to invoke the MS-DOS Prompt from the Windows Start 
Menu, or the Windows Run dialog and type the following command line with 
appropriate command switches: 


"filepath1\App\systat.exe" /switch(es) "filepath2\filename.xxx" 


where filepath] is the SYSTAT installation folder path, filepath2 is the location of the 
file on which SYSTAT will operate. (The quotes are required only if there are gaps in 
the file path or filename.) Depending on the switch(es) and .xxx you give, the tasks 
described below can be automated: 


Switch .xxx Description Example command 
Ix -sycor.cmd Opens SYSTAT and submits filename.syc Systat /x c:\data\namel.syc 
Ie -syc or .cmd Opens SYSTAT and loads filename.xxx Systat /c "c:\my 


onto the middle tab of the Commandspace data\name2.cmd" 


Opens SYSTAT, submits filename.xxx, 
and exits the application if file-not-found 
errors are encountered. 


Opens SYSTAT, executes any commands 

the user may give, and on exit, automati- Systat /gscgm 

cally saves (in CGM format) all graphs in "c:\graphs\my graph.cgm" 
the Output Editor. 


Opens SYSTAT, and stores all error mes-  systat /elog 
/elog .dat sages encountered during command c:\data\prompt\Error- 
execution, into filename.xxx. Log.dat 


.syc or .cmd Systat /e /x 


le /x c:\data\name3.syc 


/gscgm .cgm 
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Opens SYSTAT, submits filename.xxx, 


/gexit ; E ; Systat /gexit /' 
.Syc and exits the application if no graph is stat Bext Ix 
Ix generated on EENE E grap! c:\data\prompt\name4.syc 
Opens SYSTAT with its window mini- 
I XXX TE ; ; Systat /m /x 
m mized; you can include other keys with &\qata\n: e5.syc 


this. 
Opens SYSTAT, executes any commands 
Fat dat the user may give, and on exit, saves all systat /out 
A the text output generated during the ses- c:\data\prompt\testN.dat 
sion into filename.xxx. 


systat /x 
i Opens SYSTAT, executes the command c:\data\prompt\name6.syc 
/mht mht file given with /x, and saves the output in /mht 
the MHT format to filename.mht. c:\data\prompt\outfile6.mh 
t 
systat /x 


Opens SYSTAT, executes the command — \data\\ Hanes 
jacks ada file given with /x, saves the output in the Foc promphnamen.sye 
MHT format to filename.mht, and quits -\data\ fi 
SYSTAT. a j ata\prompt\outfile6.mh 


Note: In the command file you submit, any GSAVE, OSAVE, and EXPORT commands, 
will save the graph, output and data respectively, into a filename of your choice, which 
can be later used for further processing by SYSTAT or other programs, after this 
session of SYSTAT has quit. 


Command Templates 


Command files provide a method for repeating analyses across SYSTAT sessions. 
Output produced by a particular command file will be identical to output produced by 
any subsequent runs of the same command file (assuming the data do not change). If, 
however, we change the data file in use or replace the variables used for a graph or 
statistical analysis, the results will vary from the original output but still retain the same 
structure. Command templates provide a method for achieving this customizability. 

A command template provides a skeletal framework for graph creation, statistical 
analysis, or file management. The template has the appearance ofa standard command 
file, but uses tokens in place of filenames, variables, numbers, or strings. Tokens serve 
as substitution markers; a value must be substituted for the token for command 
processing to continue. Every time you submit the command template, you can 
substitute a different value for each token. 


194 


Chapter 5 


For example, suppose we were to create a template for simple linear regression. This 
model requires a response variable and a predictor variable. We define the model with 
placeholders for these two variables. Substituting empirical variables for these 
placeholders yields regression output for that model. Either or both of these variables 
could be replaced to generate new output using the same general model for different 
data. 

The ampersand character denotes tokens. The text immediately following an ‘&’ 
corresponds to a token name. Token names may contain any number of characters, 
numbers, underscores, and dollar signs, but the first character after the ampersand must 
be a letter or number. Dollar signs do not denote strings and may appear anywhere in 
the token name. As with variable names, token names are not case sensitive. The 
names &tokn, &tOKn, &ToKn, and &TOKN are equivalent; if all of these names 
appear in a template, substituting a value for one of them also substitutes that value for 
the others. 

In some instances, ampersands should not be treated as token indicators. For 
example, the command 


USE JUNE&JULY 


accesses the data file JUNE&JULY. However, SYSTAT interprets the & as a token 
indicator and prompts the user for replacement text for &/ULY. Two methods exist for 
avoiding this problematic behavior: 


m Ifthe command file does not involve any token substitution, turn token processing 
off by including the line TOKEN / OFF at the beginning of the command file or by 
using the General tab of the Global Options dialog. Use TOKEN / ON to reactivate 
token processing for subsequent command submissions. 


m Ifsome ampersands denote tokens but others do not, suppress token processing 
wherever needed by doubling the ampersand character. For example, replace 
JUNE&JULY with JUNE&&JULY. SYSTAT interprets two consecutive 
ampersands as a single character rather than a token indicator. 


As SYSTAT processes commands, token substitution occurs either automatically or 
interactively. In automatic substitution, information supplied in the template replaces 
placeholders as they are encountered. Interactive substitution, on the other hand, 
involves prompting the user for placeholder replacement information. Command 
processing halts until valid information is supplied. 
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Automatic Token Substitution 


Define tokens for automatic substitution by specifying: 


TOKEN &tok = value 


When SYSTAT encounters &tok during command submission, the defined value 
replaces the token automatically. 

Quotes around token values are NOT included in the replacement value of the 
token. For example: 


TOKEN &str1 ‘Depression’ 
BAR dscore / XLAB=&strl TITLE='Bar graph of &str1' 


defines the token &str/ to have a value of Depression. In the bar graph, Depression 
appears entirely in capital letters for the x-axis label but not for the title. Because the 
token value does not include the quotes, the value can be incorporated into other 
strings, as in this graph title. Without quotes, labels appear in upper case, as in this x- 
axis label. If quotes around the token are desired in the command file, explicitly 
include them in the command lines. 


Interactive Token Substitution 


To prompt the user for a token substitution value, precede the token text with an 
ampersand in the command file. During processing, when SYSTAT initially 
encounters the token, a dialog prompts for a replacement value. 


Replace String 


Entering a value and pressing the Continue button allows processing to continue. 
Pressing the Cancel button halts further submission of the command file. 
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If subsequent commands use a token which has already been assigned a value, 
SYSTAT substitutes that value automatically. For example, the command: 


PLOT &y*&x 


results in dialog prompting for the tokens &y and &x. Suppose the current file has 
variables named AGE and DEPRESS. If we assign DEPRESS to &y and AGE to &x, the 
resulting graph plots depression score versus age. If the command file continues with: 


REGRESS 
MODEL &Y = CONSTANT + &X 
ESTIMATE 


SYSTAT computes the regression of depression score on age without prompting for 
substitution values. 


Validating Input. The Token Substitution dialog accepts any value supplied by the 
user. However, commands typically require numbers, strings, or filenames to execute 
correctly. To impose restrictions on token replacement values, define tokens using the 
TOKEN command with the TYPE option, as follows: 


TOKEN &tokl / TYPE = tokentype 


Valid tokentype values include: MESSAGE, OPEN, SAVE, VARIABLE, NVARIABLE, 
CVARIABLE, MULTIVAR, NMULTIVAR, CMULTIVAR, STRING, NUMBER, and INTEGER. 


During processing, when a token is encountered, SYSTAT scans for a definition. If 
SYSTAT finds an associated TOKEN definition, a dialog consistent with the token type 
appears. Otherwise, a default dialog prompts the user for information. 


Resetting Tokens. Tokens can be reset individually or globally. To clear all tokens, use 
TOKEN without arguments or options. Any tokens used in subsequent command lines 
result in prompting for replacement values. 

To reset an individual token, redefine the token using a new TOKEN command. For 
example, 


BAR &y*&x 
TOKEN &x 
DOT &y*&x 


initially prompts for two token values. DOT, however, only prompts for a value for &x, 
the token reset between the BAR and DOT commands. 
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Message Tokens 


In contrast to all other token types, message tokens do not function as substitution 
markers. Instead, the message token yields a dialog designed to provide the user with 
information about the template. To define a message token, include a command line 
having the following form in your command file: 


TOKEN &msg/ TYPE=MESSAGE PROMPT="Prompting text appears here." 


Information 


Common information to include in the prompting text includes: 
m the result of running the template file. 

m changes to the data file, if any. 

m state of SYSTAT when template processing completes. 


When command processing begins, SYSTAT immediately displays the prompting text 
for a message token in a dialog. Based on this text, the user can elect to continue or 
cancel processing. Pressing Cancel halts processing with no other commands in the 


template being executed. 
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Filename Tokens 


Filename tokens represent any file that SYSTAT can open or save, including data files, 
command files, and output files. To substitute a filename for a token, specify one of the 
following: 


TOKEN &file / TYPE=OPEN 
TOKEN &file / TYPE=SAVE 


When SYSTAT encounters the token &fìle in the command file, a dialog prompting the 
user for a filename appears. SYSTAT substitutes the name of and path to the selected 
file for the corresponding token. 


The OPEN type should be used when opening data files or for submitting command 
files. For example: 


TOKEN &datafile / TYPE=OPEN 
TOKEN &cmdfile / TYPE=OPEN 
USE &datafile 

SUBMIT &cmdfile 
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Accident.syd 


| Accident. syz 
f M) adaptor.syd 


Adaptor.syz 
Bjaddstati syd 


} JAddstat2.syd 
f Addstat3.syd 


i M)Addstat4.syd 
Bjaddstats.syd 

Addstaté.syd 

Adjadaptor.syd 


EEY 


E) Admire.syz 
Admit.syd 
Admit.syz 
Aerosol.syd 

+ sip syz 
Afifi.syd 
Afifi.syz 
Age1.syd 

Agel .syz 


eraa 
Bacesex.syz 


Bart «syd 
ac Syz 
Agr2.syd 
FE „SyZ 
Agree.syd 
Agree.syz 
MlAiag.syd 
Baiag. syz 
Aircraft.syd 
Aircraft.syz 
åirline.syd 
Airline.syz 
Akima.syd 


Akima.syz 
GS] Am.syd 
En „Syz 
Anneal.syd 
H) Anneal.syz 
Mansfield „syd 
Tat „syz 
Anxiety.syd 
too 
APP.SYD 
E) APP, syz 


Bank. syd 
Bank.syz 


| File name: 


| Files of type: 


Adjadaptor.sy2 


SYSTAT Data (*.syd.".sys,“.syz) 


[C] Open as read-only 


TOKEN &file / TYPE=OPEN PROMPT='Open - Specify a file for the token 


&file' 
USE &file 
> 


Use the SAVE type for saving outpı 


TOKEN &gphfile / TYPE=SAVE 
TOKEN &outfile / TYPE=SAVE 


PLOT Y*X 


GSAVE &gphfile / BMP 
OSAVE &o0utfile / HTML 


ut, data, or graphs to files. For example: 
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Save - Specify a file for the token 'tgphfile" 
Savein: | © Data MO @ & Gr 
p 2 ident. Bjadmire.syz Bjagri syd WBjakima.syz 


SBjadmit.syd WBjagr syz SBlam.syd 


MyRecent |f h Admit. syz Bjagrz.syd + iy 
Documents Adaptor.syz Aerosol.syd BBjaar2.syz Anneal.syd 
| Addstati.syd BB) Aerosol.syz |Agree.syd Anneal.syz 
Addstat2.syd Afifi.syd Agree.syz MJansfield.syd 
Addstat3.syd Afifi.syz Aiag.syd Bjanstield.syz 
f MEjAddstat4.syd Age1.syd Bjaiag.syz > saa 
| Anxiety. sy2 


Adjadaptor.syd Agesex.syz 
b Adjadaptor.syz Bjacestat. syd 
Sh) Admire.syd BjAcestat.syz 


CEN] 
| File name: x) 
. Save as type: SYSTAT Data (*syd,” sys.”.syz) bel 


JAddstatS.syd Agel .syz Aircraft.syd : 
Addstat6.syd Agesex.syd Aircraft.syz APP.SYD 


Airline.syd APP.syz 
Airline.syz Bank.syd 
jAkima.syd Bank.syz 


My Network | 


TOKEN &file / TYPE=SAVE PROMPT='Save As - Specify a file for the 
token &file' 

DSAVE &file 

> 


To add an instructional title to the dialog, use the PROMPT option. The specified 
prompt text appears in the title bar of the dialog. Ensure that the length of the text is 
limited to that of the title bar. 


Single Variable Tokens 
To substitute a single variable for a token, specify one of the following: 


TOKEN &var / TYPE=VARIABLE 
TOKEN &var / TYPE=CVARIABLE 
TOKEN &var / TYPE=NVARIABLE 


When SYSTAT encounters the token &var in the command file, a dialog prompting the 
user to select a variable appears. If no data file is currently open, SYSTAT prompts the 
user to open a file before proceeding to variable selection. 
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Replace Variable 


USE ourworld 
TOKEN &var / TYPE=VARIABLE IMMEDIATE 


> 

Select a variable and click Add. Click Continue to continue command processing. The 
list of available variables corresponds to the dialog type. The variable list contains only 
string variables if the token type equals CVARIABLE. The NVARIABLE type lists 
numeric variables for token substitution. To list all variables, use TYPE=VARIABLE. 


Multiple Variable Tokens 


To substitute multiple variables for a single token, specify one of the following: 


TOKEN &var / TYPE=MULTIVAR 
TOKEN &var / TYPE=CMULTIVAR 
TOKEN &var / TYPE=NMULTIVAR 


When SYSTAT encounters the token &var in the command file, a dialog prompting the 
user to select multiple variables appears. If no data file is currently open, SYSTAT 
prompts the user to open a file before proceeding to variable selection. 
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Replace Variable 


Replace &VAR with: 


Available variable{s): 

: COUNTRY$ 
POP_1983 
POP_1986 


BABYMORT 


USE ourworld 
TOKEN &var / TYPE=MULTIVAR IMMEDIATE 


> 


Select one or more variables and click Add to include the variable(s) in the token 
replacement set. To select multiple, consecutive variables, hold down the Shift key and 
click the first and last variables in the desired set. To select multiple, nonconsecutive 
variables, hold down the Ctrl key and click each variable before clicking Add. Click 
Continue to continue command processing. 

The list of available variables corresponds to the dialog type. To list all variables, 
use TYPE=MULTIVAR. The variable list contains only string variables if the token type 
equals CMULTIVAR. The NMULTIVAR type lists numeric variables for token 
substitution. 

By default, during multiple variable substitution, SYSTAT inserts a space between 
the selected variables. To specify an alternative character, use the SEPARATOR option 
of the TOKEN command. 


TOKEN &var / TYPE=NMULTIVAR SEPARATOR='char' 


Replace char with the desired single character separator. SYSTAT truncates separators 
longer than one character to the first character. The designated character does not 
appear before the first variable or after the last variable. 
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String Tokens 
To substitute a text string for a token, specify: 


TOKEN &text / TYPE=STRING 


When SYSTAT encounters the token &text in the command file, a dialog prompting 
the user for a string appears. 


Replace String 


Replace &TEXT with: 


TOKEN &text / TYPE=STRING IMMEDIATE 


> 


Type the desired text string. The entire string, including any quotes entered as part of 
the string, replaces the token. For instance, if a plot command contains a string token 


as an option: 
PLOT Y*K / &text 


you can enter a list of options such as 


XLAB='X Variable’ YLAB='Y variable' SYMBOL=2 


as replacement text for the token. Alternatively, to prompt for each option setting, 
assign each to a separate token: 


PLOT Y*X / XLAB='&text1' YLAB=&text2 SYMBOL=&symnum 


Notice the tokens for the axis label strings in the preceding command line. For the 
x-axis, quotes enclose the token. In this arrangement, the token replacement value 
should not include quotes, but should only contain the text used to label the axis. In 
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contrast, for the y-axis, the token is not enclosed in quotes. The appearance of this axis 
label depends on whether the quotes are included in the token replacement value: 


m Typing Response results in a label of RESPONSE. Without using quotes, SYSTAT 
displays labels in capital letters. 


m Typing 'Response' results in a label of Response, Because the command line does 
not include quotes around the token for the y-axis label, quotes must be included 
in the replacement value for the label to match the case of the supplied text string. 


Numeric Tokens 
To substitute a numeric value for a token, specify one of the following: 


TOKEN &num / TYPE=NUMBER 
TOKEN &num / TYPE=INTEGER 


When SYSTAT encounters the token &num in the command file, a dialog prompting 
the user for a number or integer appears. 


Replace Number 


Replace &NUM with: 


$ 
t 


TOKEN &num / TYPE=NUMBER IMMEDIATE 
> 


After entering a value, press Continue. If the value is not numeric, an error occurs and 
the user is prompted again. Likewise, attempts to input a decimal value for an integer 
result in re-prompting. The prompting dialog continues to appear until a valid value is 
entered or the Cancel button is pressed. 
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Custom Prompts 


By default, the instruction appearing in substitution dialogs states “Replace &tok 
with:”. To assist the user in entering valid information for a token, replace the default 
instruction with a custom prompt using the PROMPT option of the TOKEN command. 
For example, to prompt the user for a graph title, use 


TOKEN &titlel / PROMPT='Enter the graph title:' 


When SYSTAT encounters &title/, the following dialog appears: 


Information 


Enter the graph title: 


TOKEN &titlel/TYPE = MESSAGE PROMPT='Enter the graph title:' 
IMMEDIATE 


Custom prompts can include carriage returns in the prompting text, allowing you to 
define the text appearing on each line of a multi-line prompt. For example: 


TOKEN &var1/ TYPE=VARIABLE, 
PROMPT='This is the first line, 
this is the second, and, 
this is the third’ 


results in a three-line prompt. In the absence of carriage returns, SYSTAT 
automatically wraps prompting text to fit the dialog. Although the dialogs for string, 
number, and integer replacement have no practical limit on the number of lines that can 
be used as a prompt, the dialogs for variable selection limit custom prompts to three 


lines of text. 
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Dialog Sequences 


Processing of command files begins at the first line of the file and continues to the last 
line. SYSTAT does not prompt for token replacement values until the token being 
defined is encountered in a command line, unless the IMMEDIATE option is specified. 
This can result in undesirable sequences of prompting dialogs. Consider the following 
set of commands: 


TOKEN &xvar / TYPE=VARIABLE 

TOKEN &xvarlabel / TYPE=STRING 

TOKEN &yvar / TYPE=VARIABLE 

TOKEN &yvarlabel / TYPE=STRING 

PLOT &yvar*&xvar / YLAB=&yvarlabel XLAB=&xvarlabel 


First, SYSTAT prompts for &yvar, the y-variable in the scatterplot. Next, a prompt for 
the x-variable appears. Prompting continues by asking for a label for the y-axis and 
finally for a label for the x-axis. Notice that the dialog sequence does not correspond 
to the order of the TOKEN statements, but instead corresponds to the ordering of the 
actual tokens in the PLOT command. 


Rather than prompting in the order that the tokens are encountered, you can define a 
sequence for dialog prompting using the IMMEDIATE option. Instead of prompting 
when encountering the token, the prompting dialog appears when SYSTAT processes 
the TOKEN statement. For example, to prompt for the y-variable, the y-axis label, the 
x-variable, and the x-axis label, in that order, specify the following: 


TOKEN &yvar / TYPE=VARIABLE IMMEDIATE 

TOKEN &yvarlabel / TYPE=STRING IMMEDIATE 

TOKEN &xvar / TYPE=VARIABLE IMMEDIATE 

TOKEN &xvarlabel / TYPE=STRING IMMEDIATE 

PLOT &yvar*&xvar / YLAB=&yvarlabel XLAB=&xvarlabel 


In this case, SYSTAT prompts for information in the order of the TOKEN statements, 
rather than in the order that the tokens themselves appear. 


Note: SYSTAT always processes MESSAGE tokens first; these tokens do not require 
the IMMEDIATE option. 
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Viewing Tokens 


As you develop your own library of templates, it may become useful to have one 
template file submit another template file. However, if tokens have the same name in 
the two files, undesired output can result. To help correct any token ‘conflicts', you can 
list all current tokens with their defining characteristics by specifying 


TOKEN / LIST 


You will get a list of predefined tokens, as well as user defined tokens. For each token, 
SYSTAT displays: 

m the token 

m the type 

m the current assigned value 

m text appearing in the prompting dialog 

Generating this listing for each template identifies tokens common to both files. 


Differences should be examined closely; two tokens sharing a name but defined as 
different types are likely to yield odd behavior. 


Predefined tokens 


SYSTAT's default file locations as specified in the Global Options dialog are assigned 
to built-in tokens as follows: 


Token Name Token Value 

&EXPORT Folder to which data will be exported 

&GET Folder containing ASCII data for import by BASIC 
&GSAVE Folder to which graphs will be saved 

&IMPORT Folder from which data will be imported 

&OSAVE Folder to which SYSTAT output will be saved 
&OUTPUT Folder to which ASCII output will be saved 

&PUT Folder to which ASCII data will be exported by BASIC 
&ROOT Folder to which SYSTAT is installed 
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Token Name Token Value 
&SAVE Folder in which SYSTAT data files will be saved 
&SUBMIT Folder from which SYSTAT comand files will be submitted 
&USE Folder from which SYSTAT data files will be opened 
&WORK Folder to which temporary SYSTAT data files will be saved 
You can use these appropriately in your command scripts so that files are opened from 
or saved to these paths. Although SYSTAT looks for a file in these paths by default 
anyway, there may be occasions where files with the same name exist in different paths 
but you specifically need to open one of them. For example, suppose you issue SAVE 
mydata, the data file MYDATA is saved in the default &SAVE path. Suppose a file by 
name MYDATA also exists in the &USE path. When you issue USE mydata, SYSTAT 
by default opens the file in the &USE path as that gets preference over &SAVE. If you 
need to open the file in the &SAVE path, you will either have to issue the USE 
command with the full path or: 
USE &SAVE\mydata 
Refer Chapter 7, Customization of the SYSTAT environment, for details about 
SYSTAT's file locations. 

Examples 


The examples presented here illustrate some practical implementations of token 
substitution. For more examples, examine the command files used in the Graph 
Gallery. 
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Example 1 
Automatic Substitution in Exploratory Analysis 


In this example, automatic token substitution defines the input file to use. SYSTAT 
then prompts for a variable and creates a bar graph. 


TOKEN &infile = survey2 


TOKEN &catvar / TYPE=VARIABLE, 
PROMPT='Select the variable appearing in the bar graph. 


USE '&infile' / NONAMES 
NOTE 'File in use = &infile' 
CATEGORY &catvar 

BAR &catvar 

CATEGORY &catvar/OFF 


The path to the file contains spaces and must therefore be enclosed in quotes when 
defining the token. However, the quotes appearing in the token definition are not 
included in the token value. To direct SYSTAT to the correct path, we use quotes 
around the token in the USE command. Without those quotes, the program would look 
for a file named program and would return an error. 

Repeated submissions of this template allow rapid creation of exploratory bar charts 
to study the distributions of variables in the SUR VEY? file. Due to the automatic 
substitution, we are not prompted for a data file on each submission. To change data 
files, replace the path and the file in the first TOKEN command in the template. The 
note appearing in the output automatically updates to reflect the new file. 


Example 2 
Token Substitution for Variables and Strings 


Variable substitution allows templates to be used for any data file. The resulting output 
has the same general structure, but varies in its content. String, number, and integer 
substitution allows customization, giving output from different files unique features. 
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Here, we create a three-dimensional scatterplot. The string tokens provide custom 
labels and a title to help differentiate the plot from other 3D plots generated from other 
submissions of this template. 


TOKEN &xvar / TYPE=NVARIABLE IMMEDIATE, 
PROMPT='Select a variable for the x-axis. ' 


TOKEN &xvarlab / TYPE=STRING IMMEDIATE, 
PROMPT='Enter a label for the x-axis:' 


TOKEN &yvar / TYPE=NVARIABLE IMMEDIATE, 
PROMPT='Select a variable for the y-axis.' 


TOKEN &yvarlab / TYPE=STRING IMMEDIATE, 
PROMPT='Enter a label for the y-axis:' 


TOKEN &zvar / TYPE=NVARIABLE IMMEDIATE, 
PROMPT='Select a variable for the z-axis.' 


TOKEN &zvarlab / TYPE=STRING IMMEDIATE, 
PROMPT='Enter a label for the z-axis:' 


TOKEN &pltitle / TYPE=STRING, 
PROMPT='Enter a title for the plot:' 


TOKEN &Symlabel / TYPE=CVARIABLE, 
PROMPT='Select a variable to use for labeling the plot points. 


TOKEN &symsize / TYPE=NVARIABLE, 
PROMPT='Select a variable to use for sizing the plot points. 


PLOT &zvar*&yvar*&xvar / SIZE=&symsize LABEL=&symlabel, 
TITLE='&pltitle', 
XLAB='&xvarlab' YLAB='&yvarlab' ZLAB='&zvarlab' 


We use the IMMEDIATE option to ensure that the axis labeling prompts occur 
immediately after the corresponding axis assignment. 

In the PLOT command, we enclose the string tokens in quotation marks. Doing so 
preserves the case of the entered value and prevents potential syntax errors resulting 
from spaces in the replacement text. 


Variable Creation 


The VARIABLE, NVARIABLE, CVARIABLE, MULTIVAR, NMULTIVAR, and CMULTIVAR 
types of the TOKEN command allows the user to select a variable or variables from 
those found in the current data file. These types cannot be used to create new variables. 
Instead, use the STRING type for variable creation. 
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In this example, we create ten new variables. Each variable contains 100 cases 


drawn randomly from a standard norm 


al distribution. 


TOKEN &v / TYPE=STRING, 
PROMPT='Enter a name for the new variables., 
Names should be 12 characters long or less.' 


NEW 
DIM &v(10) 
REPEAT 100 
FOR i=1 TO 10 
LET &v(i)=ZRN () 
NEXT 


The DIM statement reserves memory for ten subscripted variables, assigning a root 
name supplied by the user. REPEAT generates 100 cases. The FOR..NEXT loop assigns 
standard normal deviates to each of the ten variables. 


Notice that although we are dealing 
existing variables and thus cannot b 
variables. 


Example 3 


with variables, the VARIABLE type refers to 


e used for our purposes, namely to create new 


Token Substitution for Numbers and Integers 


The following commands ge 
location. The output includes the cumu 


nerate a t-distribution with a reference line at a specified 


lative area up to, and the probability of 


obtaining a value as extreme as, the given value. 


TOKEN &df / TYPE=INTEGER PROMPT= 


for the t-distribution.' 
TOKEN &tval / TYPE 
FPLOT Y=TDF (t, &df); 
TITLE='t Distribution with 
tarea = TCF(&tval, &af) 

PRINT "Area to the left of 
IF (&tval >= 0) then pval 
IF (&tval < 0) then pval 
PRINT "Two-tailed p-value: 


=NUMBER PROMPT= 
XLIMIT=&tval XLAB='t' 


‘Enter the degrees of freedom, 


‘Enter a t value. 
YLAB='Density' 
&df DF' 


&tval:", tarea 
(2* (1-tarea) ) 
(2*tarea) 
" , pval 
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The degrees of freedom for a t-distribution must be an integer so we restrict the 
corresponding token to accept values of this type. However, t-values can be decimal 
numbers so we only restrict our t-value token to be a number instead of a character. 
The template uses the two tokens to compute the desired statistics. In addition, the 
&df token is used to generate a function plot and to title the plot. The other token, 
&tval, appears as a reference line in the function plot and in the output messages. The 
output using a value of 1.88 for a t-distribution having 3 degrees of freedom follows: 


t Distribution with 3 DF 


Area to the left of 1.88 = 0.922 
Two-tailed p-value = 0.157 


Example 4 
Normal Random Deviates Using Tokens 


No other distribution has received more attention or been used more often than the 
normal. In keeping with this trend, we use tokens to generate random deviates from a 
normal distribution with a user-specified mean and standard deviation. The user also 
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indicates the number of deviates to create. The final command plots the normal 
distribution. 


TOKEN &num / TYPE=INTEGER, 
PROMPT='How many standard normal random observations should be 
generated?' 


TOKEN &mean / TYPE=NUMBER, 
PROMPT='What is the mean for the normal distribution?' 


TOKEN &stdev / TYPE=NUMBER, 
PROMPT='What is the standard deviation for the normal, 
distribution?’ 
NEW 
REPEAT &num 
LET nrd=ZRN(&mean, &stdev) 
DENSITY nrd / NORMAL 


This template saves the generated deviates to a new variable named NRD. 
Alternatively, you could use another token to prompt the user to specify a name for the 
new variable. 


Example 5 
Random Number Generation Using Tokens 


In this example, we combine interactive and automatic token substitution to generate 
random deviates from one of four distributions: Uniform, Normal, Exponential, or 
Logistic. 


TOKEN &rndnum='rndnum' 
TOKEN &RN='RN()' 
TOKEN &dist / TYPE=STRING IMMEDIATE, 
PROMPT='Select a distribution by entering a letter., 
(U=Uniform; Z=Normal; E=Exponential; L=Logistic), 
Default parameter values = Li T: A 
TOKEN &num / TYPE=INTEGER, 
PROMPT='How many random observations should be generated? 


NEW 

REPEAT &num 

LET &dist&rndnum=&dist&RN 
DENSITY &dist&rndnum / FILL=.5 
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The &dist token yields a dialog prompting for a single letter. We use the IMMEDIATE 

option to prevent the prompt for the number of observations from appearing first. 
The LET statement combines three tokens to yield one transformation statement. A 

closer examination of this statement reveals some of the subtleties of token processing: 


m First, we need a replacement value for &dist. Due to the IMMEDIATE option, this 
token already has a replacement value (U, Z, E, or L) so processing continues. 
Suppose the entered value equals U. 

m Next, we encounter the &rndnum token. The first TOKEN statement assigns this 
token a value of rndnum. As a result, the left side of the LET statement becomes 


LET Urndnum 


After the equals sign, we again find the &dist token, which has a value of U. 
m The final token on this line, &RN, has an assigned value of 'RN ()', resulting in the 
following valid transformation statement (after token substitution): 


LET Urndnum = URN() 


The template creates a new variable with a seven-character name. The first character 
of the name denotes the distribution used to generate the values, and the final six 
indicate that the entries correspond to random numbers. 


The output after randomly generating 100 observations from a uniform distribution 
follows: 


7 =r T 


Count 
Jeg sad uomodoig 


02 03 04 05 06 07 08 0s” 
URNDNUM 


0 
0.0 0.1 
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Example 6 
Multiple Variable Substitution 


The number of variables analyzed often varies across applications of a particular 
technique. For instance, one regression model may include two variables, but another 
may include four. We can create a template for each model as follows: 


TOKEN / ON 

TOKEN &£ile/TYPE= OPEN PROMPT = "Choose a file to run Regressic 
TOKEN &resp/TYPE= variable prompt= "select the dependent varaibl 
of the model” 


TOKEN &v1/TYPE = variable prompt = “select a variable” 
TOKEN &v2/TYPE = variable prompt = “select a variable” 
TOKEN &v3/TYPE = variable prompt = “select a variable” 
TOKEN &v4/TYPE = variable prompt = “select a variable” 
USE &file 
REM Two predictors 
REGRESS 
MODEL &resp = CONSTANT + &vl + &v2 
ESTIMATE 
REM Four predictors 
REGRESS 
MODEL &resp = CONSTANT + &vl + &v2 +, 

&v3 + &v4 
ESTIMATE 


Unfortunately, although these templates apply linear regression to user-specified 
variables, these templates only apply to models involving two and four predictors, 


respectively. 
To create templates allowing for a varying number of variables, use the MULTIVAR, 


NMULTIVAR, and CMULTIVAR token types. Here, we create a linear re gression template 
allowing any number of predictors and generate hypothesis tests to determine whether 


coefficients equal zero. 


TOKEN &resp / TYPE = NVARIABLE, 


PROMPT = 'Select the response variable.' 
TOKEN &predictors / TYPE = NMULTIVAR SEPARATOR = '+', 
PROMPT = 'Select the predictor variables, 


for the multiple regression model.' 
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TOKEN &hypeff / TYPE = NMULTIVAR SEPARATOR = '&', 
PROMPT='Select predictors whose coefficients, 
you wish to test for differences from 0.' 


REGRESS 

MODEL &resp = CONSTANT + &predictors 
ESTIMATE 

HYPOTHESIS 

ALL 

TEST 

HYPOTHESIS 

EFFECT hypeff 

TEST 

TOKEN /OFF 


The &predictors token represents all predictors in the model. The user selects the 
variables to include and SYSTAT generates the token value by inserting a '+' between 
them, yielding a valid MODEL statement. 

The first HYPOTHESIS command generates a test for each coefficient in the model. 
The second HYPOTHESIS omits the selected variables from the regression model and 
compares the result with the original model. The EFFECT statement for this test 
requires an ampersand between terms, so we define the separator for this token to be 
K 


Example 7 
Graph Option Template 


The Graph tab of the Global Options dialog defines several appearance features for 
subsequently created graphs. As an alternative, the following template prompts for 
scaling percentages, line thickness, and character size before submitting a command 
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file, As a result, all graphs created by the specified file use common values for these 
three global graph characteristics. 


TOKEN / ON 
TOKEN &xyscale /TYPE=INTEGER, 
PROMPT='Enter the % reduction or enlargement for graphs., 
values below 100 result in reduction., 
Values above 100 result in enlargement. ' 
TOKEN &charsize / TYPE=NUMBER, 
PROMPT='Enter the factor by which to scale graph characters., 
A value of 2 doubles the character size., 
A value of .5 halves the character size. ' 
TOKEN &linethickness / TYPE=NUMBER, 
PROMPT='Enter the factor by which to scale line thickness., 
A value of 2 doubles the line thickness., 
A value of .5 halves the line thickness. ' 


TOKEN &cmdfile / TYPE=OPEN, 
PROMPT='Open a command file for creating graphs' 
SCALE &xyscale &xyscale 
CSIZ &charsize 
THICK &linethickness 
SUBMIT &cmdfile 
SCALE 
CSIZE 
THICK 


The final three commands return the global options to their default settings. 


Example 8 
Combining Analyses -- Two-Way ANOVA 


Menus and dialogs offer a prescribed set of options resulting in a variety of statistics 
and graphs. When performing a series of analyses or including graphs with statistical 
output, using token substitution simplifies the process considerably. For example, 
multidimensional scaling requires a matrix input. You could generate this matrix from 
arectangular file using the CORR procedure before running MDS. You could then save 
the final configuration for custom plotting. Instead of running each procedure 
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separately, however, we can automate the entire process using a template. You can 
apply the template to any data to generate output customized to your needs. 


In this example, we focus on two-way ANOVA. Using four tokens, we generate: 


box plots displaying the distribution of the dependent variable for every level of 
each factor. 


analysis of variance results. 
post-hoc tests for main and interaction effects. 


an interaction plot displaying the dependent variable mean in each 
cross-classification of the two factors. 


a residual plot. 
a stem-and-leaf-plot of the residuals. 


USE OURWORLD 


TOKEN / ON 
TOKEN &outfile / TYPE=SAVE PROMPT='Save ANOVA Statistics' 
TOKEN &factorl / TYPE=variable, 
PROMPT='What is the first factor?' 
TOKEN &factor2 / TYPE=variable, 
PROMPT='What is the second factor?' 
TOKEN &dep / TYPE=variable, 
PROMPT='What is the dependent variable?' 
NOTE 'Two-way Analysis of Variance of' 
NOTE '&dep using &factorl and &factor2 as factors' 
DENSITY &dep * &factorl &factor2 / BOX 
ANOVA 
CATEGORY &factorl &factor2/REPLACE 
DEPEND &dep 
SAVE &outfile / RESID DATA 
ESTIMATE 


HYPOTHESIS 

POST &factorl/ SCHEFFE 
TEST 

HYPOTHESIS 

POST &factor2/ SCHEFFE 
TEST 

HYPOTHESIS 
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POST &factorl*&factor2/ SCHEFFE 

TEST 

USE &outfile 

CATEGORY &factorl &factor2 / REPLACE . 

LINE ESTIMATE*&factorl / OVERLAY GROUP=&factor2, 
TITLE='Least Squares Means', 
YLAB=&dep 

CATEGORY &factorl &factor2 / OFF 

PLOT student*estimate / SYM=1 FILL=1 

STEM student 


To create the same output without a template requires the following dialogs: 
Box Plot 

ANOVA: Estimate Model 

Three uses of GLM: Pairwise Comparisons invoked thrice 

Line Chart 


Scatterplot 
Stem-and-Leaf 


For every dialog, variable selection must occur. Creating a command file does 
automate these analyses, but command files do not generalize across data files. 

By using this template, we replace the eight dialogs (and the necessary 
specifications for those dialogs) with four simple prompts. In addition, the resulting 
template can generate results for any specified data file. 
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6 
Working with Output 


Lou Ross 
(revised by Poornima Holla) 


All of SYSTAT’s output appears in the Output Editor, with corresponding entries 
appearing in the Output Organizer. You can save and print your results using the File 
menu. Using these options, you can: 

m Reorganize and reformat output. 

m Save data and output in text files. 

m Save charts in a number of graphics formats. 

m Print data, output, and charts. 
a 


Save output from statistical and graphical procedures in SYSTAT output (SYO) 
files, Rich Text Format (RTF) files, Rich Text Format (Wordpad compatible) 
(RTF) files, HyperText Markup Language (HTML) files, or (MHT) files. 


You can open SYSTAT output in word processing and other applications by saving 
them in a format that other software recognize. SYSTAT offers a number of output 
and graph formats that are compatible with most Windows applications. 

Often, the easiest way to transfer results to other applications is by copying and 
pasting using the Windows clipboard. This works well for charts, tables, and text, 
although the results vary depending on the type of data and the target application. 


Output Editor 


The Output Editor displays statistical output and graphics. You can activate the 
Output Editor by clicking on the tab, or selecting 


View 
Output Editor... 
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Format 


You can reorganize output and insert formatted text to achieve any desired appearance. 
In addition, paragraphs or table cells can be left-, center-, or right-aligned. 


Tables. Several procedures produce tabular output. You can format text in selected 
cells to have a particular font, color, or style. To further customize the appearance of 
the table (borders, shading, and so on), copy and paste the table into a word processing 
program. 

Collapsible links. Output from statistical procedures appears in the form of collapsible 
links. You can collapse/expand these links to hide/view certain parts of the output. 


Graphs. Double-clicking on a graph opens the Graph in the Graph Window. When the 
Output Editor contains more than one graph, the Graph Window contains the last 


graph. 


SYSTAT displays different formatting tools. To change the formats of the outputs, go 
to 
Edit 

Format... 
and then apply different formatting tools. Common formatting tools also appear on the 
toolbar in Customize... in the View menu, and in the toolbar in the Output Editor. 


Fonts. SYSTAT displays output in an Arial font by default. Select Font dialog box 
from 
Edit 
Format 
Font... 
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Use different options of the Font dialog box to change the appearance of any selected 
output text. You can select the desired font type, style and size. You can also select 
effects like Underline and font color to be used. 


bo Speak oe sa 


Font Style. You can change the selected output text to Bold, Italicized, and Underlined 
typefaces and also change the font color of the selected output text by selecting these 
options from Format in the Edit menu. 


Alignment. You can align the selected output text to the left, right or centre by 
selecting those options in Format. 


Bullets and Numbering. Any selected text can be formatted as a Numbered list or a 
Bulleted list from the options in Format. You can also reduce the indentation of the text 
or indent text by selecting Outdent or Indent respectively. 

Inserting Image. You can insert an image in the desired location of the Output Editor 
by selecting the Insert Image option in Format. 

Collapsible Links. By selecting the Expand All option in Format you can expand all 
the links in the output; you can collapse all those expanded links by selecting the 
Collapse All option in Format. 


You can search for specific numbers or text in the Output Editor. 
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To open the Find dialog box, from the menus choose: 


Edit 
Find... 


Find what: | | Find Next | 


[C Match whole word only _ Direction E Cancel 
[C Match case ; OUp ©Dom 


Search strings contain either complete or partial text. SY STAT searches the specified 
direction (up or down) from the current location. A string search may consist of only 
letters or letters with numbers and punctuation. For any search involving letters, you 
can impose a case restriction. For example, selecting Match case prevents a search for 
median from finding Median. 


Note: SYSTAT operates in the active space. Click the Output Editor to make it active. 
If the Commandspace is active, SYSTAT searches in the active tab of the 
Commandspace. 


Output Editor Right-Click Menu 


Right-clicking in the Output Editor provides standard editing features. These are: 


= Cut. Cut the selection and place it in the clipboard for pasting at the desired 
location(s). 


m Copy. Copy the selection and place it in the clipboard for pasting at the desired 
location(s). 


Paste. Paste previously cut or copied output. 

Delete. Delete the selections in the active tab. 

Copy All. Copy all the content in the Output Editor. 

View Source. View the HTML source code. 

Refresh. Refresh the content being viewed in the Output Editor. 


Print Preview. Display the file in the active tab as it would appear when printed. 
You can view multiple pages at a time, scroll through and zoom in or out of pages. 
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Collapse All /Expand All. Collapse/Expand all the links in the Output Editor. 
Show Toolbar. Show or hide the Format Bar. 


m New Output. Open a new output file in the Output Editor where further output will 
appear. If an output file is already open, it is closed with the option of saving it. 


m Save As. Save the file in the active tab, as a separate file. You will be prompted to 
specify the name and location of your choice. 


m Options. Set SYSTAT's global options according to your preferences. 


Note: Cut, Copy, and Delete are available only when a selection has been made. 


Output Organizer 


The Output Organizer serves primarily as a table of contents for the Output Editor. Use 
it to jump to any location in the Output Editor without having to scroll through long 
statistical or graphical results. 
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Each data file opened during a session, creates a new tree folder in the Output 
Organizer. Within each tree folder, each procedure generates entries -- one for text 
results and one for every graph. If there is no data file open, the entry is created under 
the last tree folder. Clicking an entry scrolls the Output Editor to the corresponding 
output. Double-clicking on a graph entry opens the corresponding graph in the Graph 
Window. When the Graph Window is active, clicking a graph entry dynamically 
changes the graph that is displayed in the Graph Window. You can close folder icons 
by clicking the “-” to the immediate left. Clicking a “+” opens the corresponding 
folder. However, opening and closing folders in the Organizer does not affect the 
Output Editor. 

A second use of the Output Organizer is to reorganize the results in the Output 
Editor. Cutting, copying, or pasting in the Organizer yields parallel results in the 
Output Editor. For example, clicking an icon in the Output Organizer selects that entry. 
Clicking a folder icon selects all entries contained in that folder. With the Organizer 
entry selected, copying (via the Edit menu or right-clicking) results in the output 
corresponding to the selection being copied to the clipboard. Select a new entry and 
paste to insert the copied output at the new location. Note that although the Organizer 
represents an outline of what will be copied from the Output Editor, the Output Editor 
itself does not show the selection, 


Transformations. Because transformations do not produce output, they do not 
generate Output Organizer entries. To note when transformations occur, echo the 
commands or add notes to the output. However, echoed commands still do not yield an 
entry in the Organizer. 


To Move Output Organizer Entries 


You can reorganize SYSTAT’s output simply by selecting and dragging Organizer 
entries to new locations. Use the Shift key to select a range of entries or the Ctrl key to 
select multiple but nonconsecutive entries. Selecting a folder entry causes all items 
within the folder to be selected. The Organizer places selected items immediately after, 
and at the same level as, the location to which you drag them. 

If you select items at differing levels and drag them to a new location, SYSTAT 
places the entries at the level of the target location. 
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To Insert Tree Folder 


SYSTAT generates Output Organizer entries for all statistical and graphical 
procedures. You can also create customized tree folders. Use customized trees to place 
output from several procedures in one location. 


To insert a new tree folder, from the menus choose: 


Edit 
Output Organizer 
Insert Tree Folder... 


Alternatively, you can right-click on the Output Organizer, and select 'Insert Tree 
Folder’. SYSTAT creates a folder named 'New Folder’. To rename it, select the folder 
and go to 
Edit 
Output Organizer 
Rename... 


Alternatively, right-click on the folder and select Rename. Headings appear just below 
and at the same level as the selected Organizer entry. 

You can rename any Output Organizer entry, collapse/expand all trees from Output 
Organizer in the Edit menu or from the right-click menu of Output Organizer. You can 
also view a data from the right-click menu of Output Organizer. 


Configuring the Output Organizer 


Output Organizer headings are often truncated at the right edge of the pane. To view 
the entire heading, move the mouse over the heading. 

Alternatively, you can resize the Workspace by dragging the boundary between the 
Viewspace and Workspace to new locations. Position the pointer of your mouse over 
the boundary until a double-headed arrow appears. Click your left mouse button and 
hold it down while you drag the pane edge to the desired location. 
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You can hide (or view) the entire Output Organizer without resizing it by selecting 
View 

Workspace... 
Although the Output Organizer may be hidden, the subsequent output still generates 
entries in the tree. Consequently, you can jump quickly to a specific output by 
reopening the Workspace and clicking on the entries. 


Workspace settings persist across SYSTAT sessions. For example, if you hide the 
Workspace and close SYSTAT, the next SYSTAT session begins with the Workspace 
hidden. 


To view the entire Viewspace in the full screen mode, from the menus choose: 


View 
Full Screen Viewspace... 


Output Organizer Right-Click Menu 


Right-clicking in the Output Organizer provides some important features. These are: 
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m Delete. Delete the selections in the active tab. 
Rename. Rename the selected tree folder. 
Expand All/ Collapse All. Expand/Collapse the Output Organizer without 
affecting the output in the Output Editor. 

m Insert Tree Folder. Insert a new tree folder under the active Output Organizer data 
node. You can drag and drop Output Organizer text and graph nodes and other tree 
folders into this tree folder. 

m Set as Active Data File. Set the data file as active. With more than one data file 
open in the Output Organizer, this gives you the option to work with any previously 
opened dat file as active. 

View Data. View the data file corresponding to the selected data file node. 

New Output. Open a new output file in the Output Editor where further output will 

appear. If an output file is already open, it is closed with the option of saving it. 
= Clear Output. Clear all the output generated in the Output Editor so far. 

m View Graph: View the graph corresponding to the selected graph node, in the 
Graph Window. 

m Save As. Save the file that is in focus as a separate file. You will be prompted to 
specify the name and location of your choice. 

m Show Detailed Captions. Show the underlying SYSTAT commands as Output 
Organizer node captions. 


Saving Output and Graphs 


You can save the contents of the active tab or pane in a file. SYSTAT saves combined 
statistical and graphical output in four file types. In addition, individual graphs can be 
saved in a number of graphic formats. 

When you choose Save Active File from the F ile menu, what is saved depends on 
which pane is active. If either the Output Organizer or the Output Editor is active, the 
entire contents of both panes are saved. When you choose Save All from the File menu, 
the current output, data file, and the current file of the commandspace are all saved. 
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To Save Output 


SYSTAT displays statistical and graphical output in the Output Editor, Click the Output 
Organizer or Output Editor and choose Save As from the File menu to save the 
contents of the pane. You can save Data, Command, Output, Graph, or Log using Save 
from File menu. 


Save As 
O Savein: [© systari2 ; E or E> fe 
[Commana ; i 


E 
y SYSTAT Output (* yo) 
Hy PRR BASS i 


Select a directory and specify a name and file type for the output. Output can be saved 
in SYSTAT Output (*.SYO), Rich Text Format (*.RTF) , Rich Text Format ( Wordpad 
compatible) (*.R7F), Hyper Text Markup Language (* HTM) or (*.MHT) format. 


Note: Unlike output saved in SYO or RTF format, output saved in HTM or MHT format 
preserves some properties: 
a HTML or MHT outputs are not editable. 


a As HTML or MHT underlies web page creation, presenting the resulting output on 
the Internet involves simply creating a link from a web page to the filename.htm or 
‘mht file. In addition, HTML or MHT Output allows sharing your results with 
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colleagues who do not (yet) have SYSTAT, but do have a browser, by simply 
supplying the .htm or .mht file. 


Using Commands 
To save output, enter the following: 


OSAVE FILENAME / SYO or RTF OR HTML or MHT 


Omitting SYO or RTF or HTML or MHT saves the output as a SYSTAT output file with 
an .SYO extension. 


To Direct Output to a File or Printer 


You can use commands to send output directly to a file or the printer: 
OUTPUT <FILENAME> | VIDEO or * | PRINTER or @ | 
[ /COMMANDS, ERRORS, WARNINGS ] 


For example, the commands below send a listing of cases, including commands, to the 
text file MYFILE.DAT. The OUTPUT * command at the end closes the text file so that 
subsequent output is sent to the screen only. 


Table 1: 


USE OURWORLD 

OUTPUT MYFILE /COMMANDS 
LIST COUNTRY$ HEALTH 
OUTPUT * 


To Save Results from Statistical Analyses 


Many procedures include an option such as Save or Save File that saves the results of 
the analysis in a SYSTAT data file. The contents of the file depend on the analysis. For 


example: 
m Correlations can save Pearson and Spearman correlations. 
= Factor Analysis can save factor scores, residuals, and a number of other statistics. 


m Linear Regression can save residuals and diagnostics for each case. 


232 
Chapter 6 


m Basic Statistics can save selected statistics for each level of one or more grouping 
variables. 
m = Crosstabs can save the count in each cell for later use as table input. 


Check each procedure to see what is saved. 


To Save Graphs 


SYSTAT displays graphs in the Output Editor of the Viewspace. You can save the 
graphs along with the output by using the Save on the File menu. To save an individual 
graph, double-click the graph to activate the Graph Window and use Save As on the 
File menu. 


_MyNetwork | Save ae ype: [Windows Metafile (wmi) 


By default, the file is saved as a Windows Metafile (*. WMF). You can select a different 
file type from the drop-down list. Available formats include: 


m Windows Metafile (* WMF) 
m Windows Enhanced Metafile (*.EMF) 
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Encapsulated Postscript (*.EPS) 

PostScript (*.PS) 

JPEG (*.JPG) 

Macintosh PICT (*.PCT) 

Windows Bitmap (*.BMP) 

Computer Graphics Metafile: binary or clear text (*.CGM) 
Tagged Image File Format (*.7/FF) 

Graphics Interchange Format (*.G/F) 

Portable Network Graphics (*.PNG) 


Depending on the graphic format, you can select from a number of options when 
saving the file. See the online help for details. 


Using Commands 
To save an individual graph, enter the following: 


GSAVE FILENAME / FILETYPE 


For FILETYPE, enter one of the following: WMF, EMF, EPS, PS, JPG, PCT, BMP, CGM, 
TIFF, GIF, or PNG, SYSTAT saves the most recently created graph as FILENAME. 
Issuing multiple, consecutive GSAVE commands results in multiple graphs being 
saved. SYSTAT saves the most recent first, the graph created before the most recent 
graph second, and so on. However, issuing any other command after a GSAVE 
command resets the internal index for the next GSAVE to the most recent graph. 

To save all graphs in the Output Editor, use: 


GSAVE ROOT / ALL FILETYPE 


When naming the resulting files, the software appends consecutive integers beginning 
with 1 to ROOT. 


To Export Results to Other Applications 


You can open your saved output and charts in word processing and other applications. 
In SYSTAT, save the file in a format that the other application can handle; then open 
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or import the file in that application. SYSTAT offers a number of graph formats that 
are compatible with most Windows applications. For example, you can save a SYSTAT 
graph as a Windows Metafile (*. WMF) and then insert or import the metafile into most 
Windows word processing applications. See the target application’s documentation for 
specific information. 


To Export Results Using the Clipboard 


Often, the easiest way to transfer results to other applications is to copy and paste using 
the Windows clipboard. This works for charts as well as text, although results vary 
depending on the target application. 


In SYSTAT, select the output or chart. 


From the menus choose: 
Edit 
Copy 
In the other application, position the cursor where you want the output to appear. 


From the menus choose: 


Edit 
Paste 


Tips: 


m Ifyou have problems with Paste, try using Paste Special on the Edit menu in the 
target application. With Paste Special, you can specify whether you want to paste 
the clipboard contents as text or a Windows Metafile (graphic). (Note that Paste 
Special is not available in all applications.) 


m For columns to line up properly, you must highlight text output after you paste it 
and apply a fixed-pitch font (for example, Courier or Courier New). Or, use Paste 
Special on the Edit menu to paste the text as a metafile graphic. 
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Printing 


In any SYSTAT window, choose Print from the File menu to open the Print dialog box. 


Print 


Select a printer and a print range. You can choose to print the current selection, the 
entire print range, or a specific page range. 

Use the Print Preview command in the File menu, to preview the content before 
printing it. 


Print Preview 


Inany SYSTAT window, choose Print Preview from the File menu to display the active 
document as it would appear when printed. When you choose this command, the main 
window will be replaced with a print preview window in which one or two pages will 
be displayed in their printed format. The print preview toolbar offers you options to 
view either one or two pages at a time; move back and forth through the document; 
zoom in and out of pages; and initiate a print job. 
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Page Setup 


To optimize printed output, you may need to adjust various page settings. The available 
options vary for different printers. To open the Page Setup dialog box, choose Page 
Setup from the File menu. 


Print Setup 


Letter | 


; © Portrait 
merem A) Ole 


If more than one printer is installed on your system or network, you can choose which 
one to print to. You can also specify paper size and orientation--portrait (tall) or 
landscape (wide). 


Printing Graphs Using Commands 
You can print individual graphs by entering the following: 


GPRINT / LANDSCAPE or PORTRAIT 


SYSTAT automatically sends the most recently created graph to the default printer. 
In the absence of an orientation specification, the software uses the setting for the 
current printer. Issuing multiple, consecutive GPRINT commands results in 
multiple graphs being printed: SYSTAT prints the most recent graph first, the 
graph created before the most recent graph second, and so on. However, issuing 


237 


Working with Output 


any other command after a GPRINT command resets the internal index for the next 
GPRINT to the most recent graph. 
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7 
Customization of the SYSTAT 
Environment 


(Revised by Rajashree Kamath) 


By default, the user interface contains, from top to bottom: 


Toolbars 


Workspace and Viewspace 
m Commandspace 
= Status Bar 


However, as you work with SYSTAT, you may discover that an alternative window 
organization would better match the way you work. 


The interface for SYSTAT can be completely restructured to create a comfortable, 
analytical environment in which you can be maximally productive. 


240 
Chapter 7 


Daa aa 


You can: 
m resize, hide, and reorganize windows and panes 
Œ create, reposition, and modify toolbars 


m assign sets of command files to a toolbar button, allowing quick submission of 
commonly used commands 


add menu items for frequently used commands and command files 
define settings for output, data, and graph appearance 

specify file locations for navigational ease 

define and set themes to suit your needs 


set the output to appear either based on data files used or in the order of execution 
of analyses 
Commandspace Customization 


Users who frequently use SYSTAT's command language may prefer a larger command 
area for viewing and editing of command files. To change the size of the 
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Commandspace, hover the mouse on its upper boundary until the mouse cursor 
changes to a double-sided arrow = , hold down the mouse and drag to a new location. 


The output area is automatically resized to accommodate the resized Commandspace. 


Alternatively, you can undock the Commandspace from the bottom edge of the user 

interface to increase the space available for displaying output. To do this: 

m Click the upper boundary or sidebar of the Commandspace ensuring that the mouse 
pointer does not change appearance and drag the outline to a new location without 
releasing the mouse button. Hold down the Ctrl key as you drag, to prevent docking 
with the user interface. Release the mouse button and Ctrl key when the outline 
indicates the desired position. 

= Double-click the upper boundary of a docked Commandspace to detach it into its 
last undocked position. 

Similarly, you can dock the Commandspace to its original position: 

m Click the title bar of the undocked Commandspace and drag the outline to a new 
location in the user interface without releasing the mouse button. Release the 
mouse button (do not press the Ctrl key while you do this) when the outline is at 
the desired position and touches either one of the edges of the user interface, or that 
of the Viewspace. 

= Double-click the title bar of an undocked Commandspace to reattach it at its last 
docked position. 


Hiding the Commandspace 


An undocked Commandspace always appears in front of the rest of the user interface 
and may obscure output. In such a situation, it can be hidden until needed. Selecting 
Commandspace from the View menu, pressing Ctrl + W, right-clicking in the toolbar 
area and selecting Commandspace, or clicking the Close i button after undocking 
it toggles the visibility of the Commandspace. Alternatively, you can hide the 
Commandspace and use a text editor like Notepad for command entry. 


Tip: Users who favor dialog use over typing commands should hide the 
Commandspace to maximize the area available for output. 
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Workspace Customization 


The technique to customize the Workspace is analogous to that explained for the 
Commandspace. The Workspace can also be hidden either by invoking the View menu 
and selecting Workspace, by right-clicking on the toolbar area and selecting 
Workspace, or by clicking the Close button after undocking the Workspace. 


Customizing the Output Organizer 


You can customize the captioning of text nodes in the Output Organizer. By default, 
the caption is the title of the analysis that the node pertains to. The associated command 
appears as a tooltip on mouse hover. To see the tooltips themselves as node captions, 
from the menus choose: 
Edit 

Output Organizer 

Show Detailed Captions... 

For a given analysis, the associated command is the most significant command related 
to that analysis; typically the HOT command. For example, for least-squares 
regression, the default node caption is 'OLS Regression’ whereas the detailed node 
caption is the MODEL command line. 


Adding Examples 


The Examples tab in the Workspace contains a “SYSTAT Examples” tree that is 
organised by folders and nodes, the folders corresponding to volumes or chapters of 
the SYSTAT User Manual, and the nodes corresponding to the example command 
scripts therein. Double-clicking a node executes the underlying command script. You 
can add your own examples to this tree, organized according to the directory structure 
of your folder containing such examples.To add examples, from the menus choose: 
Utilities 

Add Examples... 
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Add Examples 


In the dialog that opens, the left hand side contains a box displaying all the drives, 
folders, sub-folders and files in your hard disk. There is a check box besides each item 
to indicate whether or not you want it to be included in the Examples tab.Click on the 
check box beside a folder twice if you want to include it along with all its sub-folders 
and files in the Examples tab. The check box changes to when you do so. Click on 
it once if you want to include just the folder and the files in it. Click on a file once if 
you want to insert a node corresponding to the file in the Examples tree. Clicking again 
will allow you to uncheck an item. 


When you check a folder, ensure that you have expanded all the nodes that belong to 
it so that all the filenames therein are seen. Once you have made your selections, enter 
an Example node caption. This caption will be set for the top-level folder that is to 
contain the links to your example command files. Then press Select so that the 
corresponding tree structure is displayed in the right hand side of the dialog box. You 
can review this tree and make any further changes if desired. Once you have finalized 
your selections, press Close. This will trigger the creation of an initialization file 
corresponding to your selections. Close the current session of SYSTAT and reopen it 
to see the newly added examples. If you need to replace an examples tree that you have 
created, specify the same Example node caption when you create the new tree. 


Note: You can also customize the tree structure directly using the initialization files in 
the INI sub-folder of the SYSTAT program folder. Edit the "SycSamples.ini" file while 
maintaining the formatting of the content (described below). This initialization file 

expects the related command files to be in the SYSTAT Command folder. So you can 
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add nodes for your own command files provided they are saved in the Command 
folder. Alternatively, you can save your command files in any desired location, create 
a new initialization file in the INI folder and enter the file path of the location suffixed 
by "\#your cmdfiles.ini" in the SysMaster.ini file that is in the INI folder. Use the 
following guidelines while creating the content of your cmdfiles.ini: 


m Type the top level folder caption without indentation. 
m Use a hash (#) at the end of a caption to define tree folders or nodes. 


m Indent with the appropriate number of tab stops to create sub-folders or nodes 
within a given folder. 

m Ifa caption relates to a node, type the filename (including the file extension) after 
the hash. You can even include a sub-folder name with the filename. You can also 
skip the caption in which case the filename will be used as the node caption. 


Viewspace Customization 


By default, the Data Editor and the Graph Editor tabs are in the Viewspace. However, 
users may want to view the Data Editor and the Graph Editor simultaneously. To do 
this, click the Window menu or right-click in the toolbar area and select Tile or Tile 
Vertically. All the panes in the Viewspace get laid out in a tiled fashion. Click the 
Minimize or Close (if it is enabled) button of the panes that you do not want 
to see, and select Tile or Tile Vertically again. The pane that is active will be placed first 
in the tiled layout. Using the Window menu (or context menu of the toolbar area), you 
can also Cascade windows or Arrange Icons that have been minimized. Double-click 
one of the title bars to dock the panes to their default or previously docked positions. 


Maximizing the Viewspace 


Almost every command and dialog box creates output, all of which appears in the 
Output Editor of the Viewspace. Occasionally, statistical output or graphs may be too 
large to be viewed in the Output Editor. Even data files will typically contain more 
number of rows than visible in one view. Although scrollbars allow control over the 
contents of the viewable area, displaying graphs or results in their entirety in a single 
pane simplifies interpretation. 


The most obvious method for increasing the size of the Output Editor involves 
maximizing the user interface to fit the size of your monitor. You can close toolbars 
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that you do not use frequently. You can resize or undock the Commandspace or 
Workspace to increase the viewable output region. You can also work with the 
Viewspace in the full screen mode. To set the Viewspace to the full screen mode 


From the menus choose: 


iew 
Full Screen Viewspace... 
Alternatively, right-click in the toolbar area and select Full Screen Viewspace. 


However, some output may still require scrolling. When resizing alone cannot create 
an area large enough to view your output, consider hiding elements of the user 
interface, such as the Commandspace or the Workspace. 


Startpage Customization 


You can resize the partitions of the Startpage by positioning the mouse over any of the 
boundaries until the cursor changes to a double line _! 1 , clicking and then dragging the 
boundary to the desired position. You can close the Startpage for the remainder of the 
session by clicking the View menu and selecting Startpage, by right-clicking on its tab 
and selecting Close, or by right-clicking on the toolbar area and selecting Startpage. 
You can even prevent the Startpage from appearing in subsequent sessions by 
unchecking the Show at startup check box in the Startpage. 


Status Bar 


The status bar appears at the bottom of the user interface. 


ove NUM | SCP 


GAP. 


When the mouse pauses on a toolbar button or menu entry (including right-click 
menus), the status bar displays a brief description of that item. These descriptions help 
guide you to the most appropriate procedure for a desired task. When the Graph Editor 
is active with a graph in it, the status bar displays the name of the graph element on 
which the mouse pointer is currently positioned. 


The left side of the status bar will show the status of some output related options: 


m QGRAPH. Displayed when statistical Quick Graphs are set to appear in the Output 
Editor. Toggle this mode on or off by clicking on it. 
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HTM. Displayed when HTML based output is set to appear in the Output Editor. 
Click on this to toggle between HTML formatting and plain text formatting of the 
output. 

PLENGTH. NONE/SHORT/MEDIUMI/LONG. NONE, SHORT, MEDIUM or 
LONG are displayed when the corresponding output length is set using the Global 
Options dialog or the PLENGTH command. 

ECHO. Displayed when the commands issued by the user are set to appear in the 
output. Click on it once if you do not want the commands to be echoed. 

VDISP. LABELS/NAMES/BOTH. LABELS/NAMES/BOTH are displayed 
depending on the global setting for display of variable labels. 

LDISP. LABELS/DATA/BOTH. LABELS/DATA/BOTH are displayed depending on 
the global setting for display of value labels. 

NODE. Displayed when detailed node captions are to be shown for the Output 
Organizer. Click on it once to display brief captions. 


The middle portion of the status bar will show information about existing processing 
conditions on the data, and also allow you to edit them: 


SEL. Displayed when case selection is in effect. Pause the mouse on this to see the 
condition used for selection in the tooltip that appears. Click on SEL to invoke the 
Data: Select Cases dialog box and edit the condition or turn off selection. 


BY. Displayed when one or more grouping (By Groups) variables are declared. 
Pause the mouse on this to see the currently defined grouping variable(s) in the 
tooltip that appears. Click on BY to invoke the Data: By Groups dialog box and 
add/delete grouping variables, or turn off the By Groups declaration. 


WGT. Displayed when a weight variable is declared or exists in the data file. Pause 
the mouse on this to see the currently defined weight variable in the tooltip that 
appears. Click on WGT to invoke the Data: Case Weighting: By Weight dialog box 
and change the weight variable or turn off case weighting. 


FRQ. Displayed when a frequency variable is declared or exists in the data file. 
Pause the mouse on this to see the currently defined frequency variable in the 
tooltip that appears. Click on FRQ to invoke the Data: Case Weighting: By 
Frequency dialog box and change the frequency variable or turn off frequency 
declaration. 


ID. Displayed when an ID variable is declared or exists in the data file. Pause the 
mouse on this to see the currently defined ID variable in the tooltip that appears. 
Click on ID to invoke the Data: ID Variable dialog box and change the ID variable 
or turn off ID variable declaration. 
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CAT. Displayed when one or more categorical variables are declared or exist in the 
data file. Pause the mouse on this to see the currently defined categorical 
variable(s) in the tooltip that appears. Click on CAT to invoke the Data: Category 
dialog box and add/delete categorical variables, or turn off category declaration. 


The right end of the status bar shows the current condition of the command 

autocompletion mode and four keyboard states: 

= AUTO. Displayed when the Commandspace supports autocompletion of 
commands. Click on it to toggle this mode. See the Global Options section for 
details about this feature. 

m OVR. Displayed when the keyboard is in overstrike mode. In this state, typed text 
replaces the text at the current location. This gets grayed out when the Insert key 
on your keyboard is pressed to set it to the insert mode. The insert mode allows 
insertion of new typed text at the current cursor location, shifting any existing text 
to the right. 

m CAP. Displayed when Caps Lock is active. In this state, every typed letter appears 
in upper case. Use the Caps Lock key to toggle this state on and off. 

m NUM. Displayed when Num Lock is active. With Num Lock on, the keyboard 
keypad enters numbers. With Num Lock off, the keypad moves the cursor in the 
current window. The Num Lock key toggles this state on and off. 

m SCRL. Displayed when Scroll Lock is active. With Scroll Lock on, if the Data 
Editor is active and you use the arrow keys on the keyboard, the entire sheet will 
scroll. The Scroll Lock should be off, if you want to use the arrow keys for 
navigation around the Data Editor. 


Status Bar Customization 


Ofthe status bar items mentioned above, the QGRAPH, HTM, ECHO, SEL, BY, WGT, 
FRQ, ID, CAT and OVR items appear by default. You can add or remove items from 
the status bar by right-clicking on it. In the context menu that appears, check the items 
you want to keep and uncheck the items you do not use. You can get all the items to 
appear by selecting All Items; all the items will disappear if you select No Items. To 
revert to the default set of items, select Default Items. If you simply do not need the 
status bar or need more area available for a window, from the menus choose: 


View 
Status Bar... 
Repeat the above steps to bring back the status bar. 
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Customizing Menus and Toolbars in SYSTAT 


Menu Customization 


SYSTAT has a default organization for the menus and toolbars, based on similarity of 
features. However, users can customize these according to their needs and preferences 
using the Customize dialog box. 


To open the Customize dialog box, from the menus choose: 


View 
Customize... 


Alternatively, right-click in the Toolbar area and select Customize. 
The four tabs in the Customize dialog box can be used to customize menus (including 


right-click or context menus), toolbars, and keyboard shortcuts. A context menu is also 
available to customize menu items and toolbar buttons, as long as this dialog is open. 


Commands Customization 


Any menu, menu item within it, or toolbar button can be moved from its default 
position to any other position either in the menu bar, any menu or in any toolbar. Keep 
the Customize dialog open or, in the case of toolbar buttons and terminal menu items, 
hold down the Alt key and drag and drop the item (there will be a border around the 
item while it is being dragged) to the desired position. To copy an item instead of 
moving it, hold down the Ctrl key as well. To completely remove an item, just drag it 
out of the menu and toolbar area, Dragging an item slightly to the right creates a 
separator before it, while dragging it slightly to the left removes the separator if any. 
All changes can be reset using the Reset and Reset All buttons in the Toolbar and Menu 
tabs of the Customize dialog, or the Default Settings link in the SYSTAT program 
group of the Windows Start Menu. 


You can also create new menus, menu items or toolbar buttons by dragging and 
dropping items from the list of items in the Commands tab of Customize, into the 
desired menu or toolbar position. 
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Customize 


Commands | Toolbars || Keyboard Menu | 


Categories: Commands: 


Edit 

View 

Data 

Utilities 

Graph 
Analyze 
Advanced 
Quick Access 


‘Window t 
tiaia E can Astin Eila 


Description: 


2| 


The Categories list contains the names of all the menus and menu items. Clicking any 
of these displays the corresponding menu items in the Commands list. Now, all you 
need to do is to drag and drop items from this list to the desired position. If you are not 
sure what a particular item here corresponds to, select it to view a description of the 


item in the Description area. 


Items that have images preceding their names will be displayed as buttons with the 
images on them, whereas the Button Appearance dialog pops up when you drop items 


that do not. 
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Three choices are available: 


m Image only. The image that you select from the Image area will be displayed. 

m Text only. The button will only have a caption. Use the default button text that is 
displayed in the Button text area, or enter your own text. 

m Image and text. Both the image that you select and the desired text will appear. 


For the first and third options, you can also create your own image or edit an existing 
one in the Image area. Just press New or select an existing image and press Edit, to 
invoke the Edit Button Image dialog box. 
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Use any of the colors shown in the palette, and any of the tools in the Tools area, to 
create an image in the Picture area. The Picture area is split into pixels arranged in 
16 rows by 15 columns. Clicking in the Picture area using any of the tools, colors the 


pixels in various ways: 


Pencil. Fills any pixel that you click on, with the color selected in the Colors area. 
Fill. Fills the enclosed area (with an unbroken boundary made of a non-default 
color) in which you click, with the selected color. 

Color selection. Reads the color of the pixel that you click on, and automatically 
selects that color in the Color area. 

Line. Draws a line of the selected color along the pixels over which you press and 
drag the pointer. 

Rectangle. Draws a rectangle of the selected color, the line over which you press 
and drag the pointer being the diagonal. 

Ellipse. Draws an ellipse of the selected color, the line over which you press and 
drag the pointer being the diagonal. 


m Copy. Copies the image in the Picture area to the clipboard. 


Paste. Pastes the image in the clipboard to the Picture area. 
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Delete. Clears the image in the Picture area. 


When you press OK, the image will be displayed in the User-defined image area. Press 
OK to use it, or press Edit to edit it further. 


Button Customization 


The option to edit button appearance is also available for items in the Commands list 


th 
m 
wi 


at have default images. In fact, you can edit the button appearance and also do a lot 
ore for any menu, menu item or toolbar button. (A menu item is virtually a button 
ith text.) Simply right-click on the desired button when the Customize dialog is open. 


The following context menu pops up: 


Reset to Default 
Copy Button Image 
Delete 


Button Appearance... 
v Image 

Text 

Image and Text 


v Start Group 


Using this menu, you can: 


Reset to Default. Resets the button appearance to its default state. The default state 
for menu items without default images is the text displayed in the Commands list. 


Copy Button Image. Copies the button image to the clipboard. You can then paste 
this in the Picture area while creating new images. 


Delete. Deletes the button. Alternatively, you can simply drag a button out of the 
toolbar area to delete it. Note that, if you delete default buttons, you can only 
retrieve them by pressing the Reset or Reset All buttons in the Toolbar and Menu 
tabs of the Customize dialog. 


Button Appearance. Pops up the Button Appearance dialog. Use it as explained 
above to customize the selected button. 


Image, Text or Image and Text. Sets the button appearance to show the specified 
image alone, text alone or both image and text. 
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Start Group. Inserts a separator before the selected button. This is equivalent to 
dragging the button slightly to the right. 


SYSTAT offers over 250 buttons categorized into 32 default toolbars, to provide 
immediate access to most tasks. Since showing all of these buttons or toolbars would 
greatly diminish the area available for output and commands, only five default toolbars 
with functionality designed to appeal to most users are set up to show in the user 
interface during the installation of SYSTAT. The default buttons on each of the five 
default toolbars are: 


Menu Bar. File, Edit, View, Data, Utilities, Graph, Analyze, Advanced, Quick 
Access, Window, and Help. 

Standard. New, Open, Save, Save All, Cut, Copy, Paste, Print, Print Preview, Full 
Screen Viewspace, View/Hide Workspace, View/Hide Commandspace, 
Customize, Recent Dialogs, Submit from File List, Start/Stop Recording, Play 
Recording, Help, and Data Editor Cell Entry. 

Format Bar. Font, Font Size, Block Format, Bold, Italic, Underline, Text Color, 
Numbered List, Bulleted List, Outdent, Indent, Align Left, Align Center, Align 
Right, Insert Image, Hyperlink, Page Break, Font (Dialog), and Expand All. 
Graph. Bar Chart, Line Chart, Pie Chart, Histogram, Box Plot, Scatterplot, 
SPLOM, Function Plot, and Map. 

Statistics. Column Statistics, Two-Way Tables, Two Sample t-Test, ANOVA: 
Estimate Model, Design of Experiments Wizard, Correlations, Least-Squares 
Regression, Classical Discriminant Analysis, and Nonlinear: Estimate Model. 


The Format Bar and two more toolbars, namely Data and Graph Editing, can be opened 
through the context menu of the Output Editor, Data Editor and Graph Editor tabs 
respectively. The Data and Graph Editing toolbars have the following buttons: 


Data. Variable Properties, Insert Variable(s), Delete Variable(s), Insert Case(s), 
Delete Case(s), Find Variable, Go To, Data Editor Cell Entry, First Case in 
Column, Previous Case in Column, Next Case in Column, Last Case in Column, 
and Invert Case Selection. 

Graph Editing. Copy Graph, Graph View, Page View, Text Tool Font, Drawing 
Attributes, Pointer Tool, Draw Line, Draw Polyline, Draw Arrow, Draw 
Rectangle, Draw Circle, Draw Ellipse, Text Tool, Pan, Zoom In, Zoom Out, Zoom 
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Selection, Reset Graph, Graph Tooltips, Highlight Point, Region Selection, Lasso 
Selection, and Show Selection. 


One or more of these buttons can be deleted and new ones can be added as described 
previously, but the toolbars themselves cannot be deleted. They can however be closed. 
Format Bar, Data and Graph Editing can be closed by right-clicking on the Output 
Editor, Data Editor and Graph Editor tabs respectively, and unchecking 'Show 
Toolbar'; repeat the same steps to display them again. Other toolbars (and also these) 
can be displayed or closed using the Toolbars tab of the Customize dialog. 


Positioning Toolbars 


Toolbars can be docked to pane borders or left “floating” in front of the user interface. 
To move a toolbar, click the handlebar 4p at the left or top and drag the toolbar to the 
new location. 


m Dragging a toolbar to the left or right side of a pane that is in the docked state 
attaches or docks the toolbar vertically to that side. 


m Dragging a toolbar to the top or bottom of a pane that is in the docked state attaches 
or docks the toolbar horizontally. 


m Dragging a toolbar anywhere other than window borders creates a detached, 
floating toolbar. Alternatively, you can hold down the Ctrl key while dragging to 
prevent toolbar docking. Clicking the in the upper right corner closes floating 
toolbars 7 


Toolbar Customization 


The Toolbars tab of the Customize dialog enables you to close or display SYSTAT 
toolbars, as well as create new toolbars. 
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Customize 


Commands| Toolbars Keyboard || Menu | 


Toolbars: 


[Analysis [© Show ToolTips Reset 


a 
[Basic Statistics g wit 
ith Shortcut Keys 
[Data Reset All 
E Data Manipulation | — 


[Data Organization 

[E Design of Experiments Ss 
[E Exploratory Graphs | Rename. | 
[Format Bar paresis 


[i Delete +- 


Graph Editing 
Graphs 
[Header & Footer g 


2| 


The Toolbars list contains the names of the available toolbars prefixed by check boxes. 
Notice that Menu Bar, Standard, Format Bar, Graph and Statistics are checked 

(by default), and also that Menu Bar cannot be unchecked. To close a toolbar except 

the Menu Bar, simply click on the checkmark to uncheck its name. Likewise, to display 
a toolbar, check the corresponding name in the list. 


Apart from making use of the 32 built-in toolbars, you can create your own toolbars. 

Press the New button, enter the desired name, and press OK. The toolbar appears in 

front of the dialog. Drag it to the desired location or leave it floating in front of the 

interface. Drag and drop the desired menu, menu items, or toolbar buttons, from other 

toolbars or the Commands list in the Commands tab, into the new toolbar. 

m Toresetany toolbar to its default state, select its name in the Toolbars list, and press 
the Reset button. To reset all toolbars, just press the Reset All button. 

m To rename or delete a toolbar that you have created, press the Rename or Delete 
buttons respectively. 


The Toolbars tab also offers optional button appearance features: 
m Show tooltips. Displays the button name when the mouse pauses on a button. 
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m With shortcut keys. Displays the shortcut key sequence to be pressed to invoke the 
same feature, along with the button tooltip. 
Keyboard Shortcuts 


Although SYSTAT runs in a Windows environment, many users find manipulating the 
mouse to be an annoyance. Fortunately for these users, every menu item can be 
accessed using the keyboard. 


The F10 key activates the File menu. Once activated, use the arrow keys to navigate 
through the menu system. The up and down arrows scan vertically through the active 
menu. The left and right arrows open submenus or move between menus. Use Enter to 
execute a selected item. 

SYSTAT also offers shortcut and access keys for keyboard control of the SYSTAT 
interface. 


Shortcut (Accelerator) Keys. In general, shortcut keys involve holding down the Ctrl 
key with a single letter to perform a specific task. Most shortcut key combinations 

appear on the menus after the equivalent entry. Shortcut key behavior may depend on 
the active window. For example, Ctrl + P prints the content of the Output Editor if it is 
active, but prints a graph if the Graph Editor is active. The following shortcut keys are 


available: 
Table 1: 
Pane/Tab Shortcut Key Function 
(Any) Ctrl+N create a new file in the active tab 
Ctrl + O open a file in the active tab 
Ctrl +1 open data file. 
Ctrl + Shift + | import a data file from a database. 
Ctrl + S save the content of the active tab 
Ctrl +Alt + S save all open files. 
Ctrl + D, Ctrl + E save current data 
Ctri + Q quit the SYSTAT application. 
Ctrl + X cut selection, placing contents on the clipboard 


Ctrl + C, Ctrl + Insert copy selection to the clipboard 
Ctrl + V, Shift + Insert paste clipboard contents at the current location 
Del delete the current selection 
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F6 

Ctrl + 0 

Ctrl + Shift + O 
Ctrl + Shift + D 
Ctrl + Shift + G 
F4 

Ctrl + G 

Ctrl + Alt + F 
Ctrl + 1 

Ctrl +2 

Ctri + 3 


Ctrl + Tab 


Ctrl + Alt + Tab 
Ctrl + Alt + Shift + 
Tab 

Ctrl + Home 

Ctrl + End 

F10 

Esc 


Ctrl + Shift + Alt + P 


Ctrl + Alt + P 


Ctri + P 

Ctrl + Z, Alt + Back- 
space 

Ctrl + Y 

Ctrl + F 

F3 

Ctrl + H, Ctrl+R 

Ctrl + A 


Ctrl + Shift + F 
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Table 1: 


invoke the Global Options dialog. 
launch a full screen view of the Viewspace. 
activate the Output Editor 

activate the Data Editor 

activate the Graph Editor 

invoke the Customize dialog. 

invoke the Graph Gallery 

invoke the Graph: Function Plot dialog. 
activate the Workspace. 

activate the Viewspace. 

activate the Commandspace. 


move the focus between the three spaces of the user 
interface. This shortcut will not cycle between the 
three tabs of the Commandspace. 


cycle forward (to the right) through the tabs of the 
active space. 


cycle backward (to the left) through the tabs of the 
active space. 


move the cursor to the top of the active tab. 
move the cursor to the end of the active tab. 
activate the File menu 

closes an open dialog box 


specify the printer, paper size, source and orienta- 
tion to be considered while printing. 


preview the content of the Output Editor before 
printing. 
print the content of the Output Editor. 


undo step by step, a few steps of editing done 


redo step-by-step, a few steps of editing done 
find text. 

find the next instance of the text specified for the 
search. 

replace text. 

select entire contents of the active tab. 


set the font of subsequently typed (not generated) 
or selected text in the Output Editor. 
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Data/Variable 
Editor 


Graph Editor 


Commandspace 


Ctrl + Shift + Alt + P 


Ctrl + Alt + P 


Ctrl + P 


Ctrl + Z, Alt + Back- 
space 


Ctrl + Y 
Ctrl + F 
Ctrl + H, Ctrl+R 
Ctrl +A 


Alt + Insert 


Ctrl + Shift + Insert 


Ctrl + Shift + Del 
Ctrl + Shift + P 
Shift + Del 

Ctrl + P 

Del 


F7 
Ctrl + L 


F8 
Ctrl + F7 


Table 1: 


specify the printer, paper size, source and orienta- 
tion to be considered while printing. 


preview the data/variable information before print- 
ing. 

print data/variable information. 

undo step by step, upto 32 steps of editing done. 


redo step-by-step, upto 32 steps of editing done. 
locate a variable in the Data Editor. 

replace instances of a string in a given column. 
select entire contents of the active tab. 


add empty rows in the Data Editor (appends at the 
end of a file if one is already open). 


insert variables in the Data Editor before or after a 
selected column. 


delete the selected variables in the Data Editor. 
open Variable Properties for the current column 
cut the selected variable or case 

print the graph that is in the Graph Editor. 
delete any annotation that you may have created 


submit the contents of the active tab in the 
Commandspace. 


submit the command line on which the cursor is 
currently positioned. 


submit the selection in the active tab of the 
Commandspace. 


submit a command file. 
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Access keys. Access keys provid 
entries. Access keys open menu: 


Ctrl + Shift + V 
Ctrl + Shift + Alt + P 
Ctrl + Alt + P 


Ctrl + P 


Ctrl + Z, Alt + Back- 
space 


Ctrl + Y 
Ctrl + F 


F3 


Ctrl + H, Ctrl + R 
Ctrl +A 
Ctrl + Shift + F 


F9 
Ctrl + W 


entries using designated letters. 


Customization of the SYSTAT Environment 


Table 1: 


submit the contents of the clipboard. 


specify the printer, paper size, source and 
orientation to be considered while printing. 


preview the output before printing. 

print data. 

toggle between undoing and redoing the last step of 
editing. 

redo the step that was last undone. 

find text 


find the next instance of the text specified for the 
search. 


replace text 
select entire contents of the active tab. 
set the font to be used in the active tab. 


recall commands from the command buffer one-by- 
one starting from the latest. 


toggle visibility of Commandspace 


e an alternative to accelerator keys for accessing menu 
s using the Alt key and allow navigation to selected 


me of each menu contains one underlined letter. Pressing Alt and the 
underlined letter opens the corresponding menu. After opening a menu, you can 
execute any of the displayed entries. 

menu titles, each menu entry contains one underlined letter. Pressing this 
letter runs the entry as if it had been selected using the mouse. 


The list of access keys is too long to be displayed here. To view the key required for a 


particular menu entry, 
quickly become famili 


open the menu and scan through the underlined letters. You will 
iar with the procedures and graphs you use frequently. 
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Customize 


Commands | Toolbars | Keyboard | Menu | 


Category: Set Accelerator for: 


E| [Defaut Š 


Commands: Current Keys: 


Clipboard 
Command 
Command 
Command... 
Current Line 
(Data —— =] Press New Shortcut Key: 
Description: 


Keyboard Shortcut Customization 


The default keyboard shortcuts may be changed and new keyboard shortcuts can be 
defined using the Keyboard tab of the Customize dialog. 


Category. Lists all the menus in the Menu Bar, and one entry for all commands put 
together. 


Commands. Lists all the menu items under the menu selected in Category. Select a 
command to see its description in the Description area. 


Current keys. Displays the keyboard shortcut(s) already assigned (either by SY STAT 
or by you) to the command selected in Commands. If you do not want to use an existing 
keyboard shortcut key, select it and press the Remove button to remove the assignment. 
To reset keyboard shortcuts for all commands to their default assignments, press 
Reset All. 


Press new shortcut key. Press the desired shortcut key or key combination for the 

selected command. The key name will be automatically displayed in this area as you 
press it. Key combinations will have to begin with Shift, Ctrl, Alt, or any combination 
of these, and end with one other key. When you are satisfied with the key combination 
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you have typed, press Assign. You can define more than one keyboard shortcut for a 
command. 


If a key combination you have typed in the new shortcut key area has already been 
assigned to some other command, then that command will be displayed in the Assigned 
to area, and the Assign button will be disabled. Also, the new shortcut key area will not 
register any external keyboard shortcuts, since such shortcuts may also be useful while 
working with SYSTAT. (In fact, pressing such shortcuts will perform the associated 
external task.) For instance, Alt + Tab is a Windows shortcut that lists all open 
windows, allowing you to select one by holding Alt down and repeatedly pressing Tab. 
This functionality offers quick navigation between the SYSTAT user interface and any 
other program you may be running concurrently. 

Access Key Customization. The access key for a menu item is indicated by typing an 
ampersand before the underlined letter, in the Button text area of the Button 


Appearance dialog box. You can change the access key to use, by moving the 
ampersand to be just before the desired letter in the caption. Take care to see that you 


do not create duplicate access keys. 
Menu Customization 
SYSTAT has several context menus that pop up on right-click in various parts of its 


user interface. Use the Menu tab of the Customize dialog box to customize these 
menus, as well as set a few other options. 
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Customize 


í Application frame menus; ———— a Context menus: 
| Show menus for: 


lefault application menu. | Commands tab, and drag items | 
ppears when no documents are P 


Reset. The default menu structure of SY STAT may be modified according to the user's 
preferences and needs, as described earlier. Use the Reset button to reset the menu 
structure to its default state. 


Context menus are available for the Startpage, Output Editor, Data Editor (columns, 
rows and cells), Graph Editor, Output Organizer (data, view data, graph, other, and 
main), Examples (folder and node), Interactive, Batch, and Log tabs of the 
Commandspace, and status bar. To customize a context menu, select it from the drop- 
down list (or right-click in the associated pane) so that it pops up. Customize it as you 
would customize any other menu or toolbar. If you drag and drop toolbar buttons, the 
associated text is automatically displayed (you cannot display only button images 
here). Any changes are immediately applied. Press the Reset button in the Context 
menus group to reset the selected context menu to its default state. Press the Close 
button at the top right corner or close the Customize dialog to close the popped up _ 
menu. 


Font. Select the desired font and font size to be used for all the menu items. 


Menu animation. By default, all SYSTAT menus pop-up immediately on click. You 
may choose to leave it that way or use one of the two available animation effects: 
Unfold and Slide. 
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Select context menu. Select the context menu that you want to customize. Press Reset 
to reset any changes you may have made to the selected context menu to the installation 


default. 


Popup menu. Use this to create new popup menus in the Menu Bar. Enter the name 
of a the popup menu and press Create. The new menu gets added as the first item in 
the Menu Bar, Drag and drop the menu to whatever location you want it to be in. 


Command File Lists 


Command files can be saved in any folder. If you elect to organize your files by 
projects, each folder will most likely contain data, output, and command files. This 
approach groups related command files together, but may result in similar files 
appearing in several project folders. On the other hand, you can store files by type, 
resulting in a single folder containing only command files. In either situation, finding 
a particular command file can be a difficult task. The Command File List dialog 
provides a command file classification scheme that is independent of your folder 
structure, Using this dialog box, you create lists of command files having some 
element in common, such as "Charts with Error Bars". A list can then be associated 
with the Submit From File List toolbar button or menu item for immediate processing 


of any file contained therein. 
To open the Command File List dialog box, from the menus choose: 
Utilities 


User Menu 
Command File List... 
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Utilities: User Menu: Command File List 


Lists. Displays all defined command file lists. Select a list to view the names of all 
command files assigned to the list, in the List Contents list. You can define lists or 
remove defined lists as described below. Once you do that, select a list to assign it to 
the Submit From File List button and menu item; SYSTAT automatically links the two. 
You can change the list assigned to the toolbar button by selecting a different list at any 
time. 


List Contents. Displays the names of the command files assigned to the selected list. 
You can assign files to or remove assigned files from the list. For example, suppose 
you have a file in C:\Folder! that produces a plot of residuals against predicted values 
and another file in D:\Folder2 that produces a probability plot of residuals. You can 
assign both files to a list called "Regression Diagnostics".The only condition is that the 
files should be text-based. 


Modify the index of command file lists or the contents of any list using the two 
customization tools. For the index of command file lists, these buttons have the 
following functions: 


= Insert Row. Creates a new command file list. Alternatively, right-click in the Lists 
header and select Insert Row. Once a row is created, you can even press the Enter 
key to create more rows. After inserting a row, type a name for the new list. The 
default name is set to List. You can replace it by a suitable name. The name should 
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be unique. Click on the row and press the Delete key if you want to clear a name. 
Press the Enter key or click outside the row to assign the name to the new list. 

m Delete Row. Deletes the selected list. Alternatively, right-click on the list and select 
Delete Row. 

For the set of command files in a list, the two buttons have the following functions: 

m New. Adds a file to the selected list. When adding a file to a list, press the 
ellipsis =f button at the right of the new entry to browse for a particular file. 
Alternatively, type the path and filename into the list of command files. SYSTAT 
automatically appends the currently defined path for command files to any typed 
filenames without a path. 

m Delete. Deletes the selected command file from the list. The command file is 
deleted from the list only; the file is not deleted from the user's system. 


Submission From File Lists 


In addition to offering a mechanism for organizing files, command file lists also allow 
submission of the files contained in the lists. As a result, you can create templates for 
custom graphs, assign them to a file list, and apply them to the current data via a mouse 


click. 
Use the Submit from File List button E on the Standard toolbar to submit files 


from previously defined command file lists. 
Alternatively, from the menus choose: 


File 
Submit 
From Command File List... 


This presents the names ofall files in the command file list that is currently selected in 
the Command File List dialog. The display contains only the filename, not the path. As 
a result, some lists may contain multiple entries with the same name, but which invoke 
different command files. Using unique names for command files avoids this potentially 
confusing situation. 

Selecting a file from the displayed list submits the corresponding file for processing. 
The commands contained in the file do not appear on the middle tab of the 
Commandspace; file submission does not affect this tab. As a result, you can have a 
command file open and submit a second file at the same time. 
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Command file lists and the list of recent command files appearing on the File menu 
offer similar functionality, but differ in several notable ways. First, command file lists 
allow you to group your files into categories, whereas file lists based on recency of use 
do not. Second, you can create multiple command file lists, each having an unlimited 
number of entries. The recent command list allows only nine entries. Third, the 
structure of command file lists persists across sessions, but lists of recent files change 
each time you open a file. Finally, command file lists submit the selected file for 
processing. The recent file list merely opens the file on the middle tab of the 
Commandspace. 


Recent Dialogs 


SYSTAT provides quick, easy access to frequently used dialog boxes. Every time you 
use (invoke and execute) a dialog from the Data, Graph, Analyze, Advanced or Quick 
Access menus, or even from the corresponding DIALOG command, it is added to the 
list of recently used dialog boxes. This list persists across SYSTAT sessions, so if you 
consistently use the same set of dialog boxes, they are always just a click away. Simply 
click the Recent Dialogs button on the Standard toolbar, or from the menus 
choose: 
Utilities 

Recent Dialogs... 
Selecting an item from the list presents the corresponding dialog box. All options and 
variable lists in the recalled dialog box reflect your specifications from the last use of 
that dialog. However, opening a different data file changes the variables available for 
an analysis and consequently resets all dialog boxes to their default settings. 

SYSTAT automatically updates the list of dialog boxes during your sessions. The 
list contains up to fifteen dialog boxes, ordered according to recency of use. Each use 
of a dialog box results in a corresponding entry at the top of the Recent Dialogs list. 
Any other instance of that dialog in the list is removed. As a result, no dialog box 
appears in the list more than once. If your list contains fifteen entries and you use a 
dialog box not appearing in the list, SYSTAT adds the new dialog to the top of the list 
and removes the oldest entry. 

Some main dialog boxes require preliminary results before they can be used. For 
instance, the Hypothesis Test dialog can only be used after estimating a model 
successfully. These contingent dialogs do appear in the Recent Dialogs list, but are 
removed each time a data file is opened. 
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Although the goal of Recent Dialogs is to present the most recently used dialogs, 
some main dialogs do not appear in the list. The Variable Properties and Add Empty 
Rows dialog boxes, for example, do not receive list entries. Furthermore, wizards that 
result in a sequence of dialogs only receive an entry for the first dialog of the sequence. 


Note: Because most dialog boxes require variable specifications, Dialog Recall is 
disabled if there is no open data file. 


User Menus 


SYSTAT's menus offer a dialog interface to most of the underlying command 
language. You can also create an additional menu with entries designed to process sets 
of commands that you frequently run. To add a user menu item, from the menus 
choose: 
Utilities 
User Menu 
Add/Delete/Modify... 


s: User Menu: Add/Delete/Modify 
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rar 


Menu item. Displays all the menu item names that are currently defined. Use the wl 
and Ix] buttons to insert new items and delete unwanted items respectively. The 
names in this list will be displayed under the Menu List sub-menu of User Menu. You 
can define any number of menu items here, but the Menu List will display the first 30. 


You have to associate each menu item you define to either of the following: 


File. Displays the SYSTAT command filename, if any, associated with the currently 
selected menu item name. To specify a different filename or when you are defining the 
menu item for the first time, type the name of a command file including its path or press 


the E button and browse for it. 


User input. Displays the set of commands, if any, associated with the currently selected 
menu item name. Edit existing commands or type a new set of commands just as you 
would in the Commandspace. You may want to type one or more DIALOG commands 
here that would pop up frequently used dialog boxes, or a command template that you 
could apply on various data files. 


Status bar. Displays the status bar help content currently associated with the selected 
menu item. You can edit existing content or type new content. 


Tooltip. Displays the tooltip that will appear on mouse hover if the selected menu item 
is placed on a toolbar. You can edit an existing tooltip or type a new one. 


Bubble Help. Displays the Bubble help content currently associated with the selected 
menu item. You can edit existing content or type new content. 

An alternative way of creating a user menu item is by using the Record Script 
feature. This feature automatically creates a menu entry if you request it to do so, and 
associates it with the command scripts it has just recorded. You can see the menu item 
list, and the recorded set of commands when you open the User Menu Profile dialog 
subsequently. For more information about this feature, see Command Language. 


To access a menu item created using the Add/Delete/Modify dialog or Record Script 
feature, from the menus choose: 
Utilities 
User Menu 
Menu List... 


and, under this, the corresponding menu item name. Clicking the name will execute the 
underlying set of commands. 
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Keyboard shortcuts. Any user menu item can be accessed using the keyboard by 
pressing the underlined number preceding its name (the full sequence would be 
ALT + U, U, L, the underlined number). 


Themes 


The themes feature of SYSTAT allows you to create, store and apply any number of 
fully customized interface themes each with its own set of menu items and toolbars as 
well as the position and size of spaces, content of the status bar, and keyboard 
shortcuts. These will be very useful if you do not need some of the menu items at all. 
If you are comfortable with a different menu arrangement or terminology, work with 
just a subset of all the data processing, analyses and graphing techniques available in 
SYSTAT, or work with one of several sets of features that you will need at various 
times. For instance, if you conduct various courses in Statistics starting from a basic 
course to an advanced one, execute projects catering to various industries, or do 
research in various application areas like Psychology, Engineering or Chemistry, you 
may create one theme for each case and apply the appropriate theme as required. 


You can save the changes you make to the default theme or any existing theme of 
SYSTAT ina theme file. To do this, from the menus choose: 
Utilities 
Themes 
Save Current Theme... 
In the dialog that pops up, enter a suitable file name, and press Save. All menu items, 
status bar content, toolbar layout and location, as well as those of the Workspace, 
Viewspace and Commandspace will be saved in this file. By default, the file will be 
saved to the Themes folder of SYSTAT. You may specify a different folder to save to; 
the advantage of saving in the Themes folder is that the theme will be listed in the 
Themes section of the Startpage. The name of the theme will be the same as the 
filename; you simply have to double-click the desired theme name to apply it. In any 
case, to apply any stored menu theme, from the menus choose: 
Utilities 
Themes 
Apply Theme... 
Navigate to your themes folder, select the desired file and press Open. 
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New themes will be available on the SYSTAT server from time-to-time. To download 
these, from the menus choose: 
Utilities 

Themes 

Download Themes... 


Any new theme will be automatically downloaded to the Themes folder. 
To revert to the default menu theme, from the menus choose: 
Utilities 
Themes 
Apply Default Theme... 


Global Options 


SYSTAT has a host of global settings that you can customize according to your 
preferences. These settings are automatically saved at the end of a session, and remain 
in effect for subsequent sessions. Most of them can also be accessed through the Global 
Options toolbar or the status bar. To open the Global Options dialog box, from the 
menus choose: 


Edit 
Options... 


The six tabs in the Options dialog box control different settings in SYSTAT. 
General. Specify general appearance and behavior options. 

Data. Specify Data Editor display options. 

Output. Specify the general appearance of output. 


Output Scheme.Specify font and color for individual components of the output, as 
well as the background image or color for all of the output. 


Graph. Specify graph scaling, line thickness, character size, and measurement units for 
all subsequent graphs. 


File Locations. Set folders in which SYSTAT should look for files of different types. 


The General, Output, Output Scheme, and File Locations tabs are described here. For 
information about Data options, see SYSTAT Data. For information about Graph 
options, see SYSTAT Graphics. 
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General Options 


The General tab of the Global Options dialog controls the ordering of variables in 
dialog boxes, token processing, and command recall. 


Edit: Options 


General | Data | Output Output Scheme] Graph | File Locations! 
Sort variable lists in dialogs by y r Random number generation 
© File order || @Mersenne-Twister algorithm 
© Alphabetical order | © Wichmann Hill algorithm 
Command buffer — = 


Number of commands to keep: 6 = 


Include commands: 


| [Z] fromthe command prompt 
[E] submitted from files, Commandspace or clipboard 
Ci submitted by dialogs 


> Bubble Help i 
[Z] Display Bubble Help. _ [F] Autocomplete commands 
Time delay: |1 sec. } 


[F] Perform substitutions specified by TOKEN commands 
[F] Show Cancel dialog to terminate lengthy processing 

[Z Prompt to save all documents while quitting SYSTAT 
[Z] Link data files to output file 

|| [Z] Save command log in output fle 


variable lists in dialog boxes by file order or 


Sort variable lists by. You can sort source 
alphabetical order. For data files with a large number of variables, it is often easier to 


the variables are sorted alphabetically. If variables are 


find variables in source lists if 
sier to select related 


grouped together in the file for a specific reason, it may be ea: 
groups of variables if the variables are sorted in file order. 


Random number generation. SYSTAT provides two algorithms for generating random 


numbers: 
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m Mersenne-Twister. This is believed to have a far longer period and far higher order 
of equidistribution than other random number generators. It is the recommended 
option especially for Monte Carlo studies. 


m Wichmann-Hill. This generates random numbers by a triple modulo method. 


Mersenne-Twister (MT) is the default option. We recommend the MT option, 
especially if the number of uniform random numbers to be generated for your Monte 
Carlo exercise is large, say more than 10,000. 


If you would like to reproduce results involving random number generation from 
earlier SYSTAT versions, with old command files or otherwise, make sure that your 
random number generation option is Wichmann-Hill (and, of course, that your seed is 
the same as before). 


For more details, see Chapter 4 (Data Transformations) of the Data volume and user 
documentation on Monte Carlo if you have the Monte Carlo add-on module. 


Command buffer. The command buffer contains the most recently processed 
commands. Use this buffer for quick recall, modification, and resubmission of 
commands using the F9 key. The number of commands to keep defines the size of the 
buffer; use the up and down arrows to adjust the number of retrievable command lines. 
The software uses the buffer to store commands generated from any of the following 
sources: 


= From the command prompt. Commands submitted using the Interactive tab of the 
Commandspace. 


= Submitted from files, the Commandspace, or clipboard. Commands submitted 
from the middle and Log tabs of the Commandspace. This option also includes 
commands submitted directly from the Windows Clipboard and command files 
submitted via the SUBMIT command. 


m Submitted by dialogs. Commands generated after clicking the OK button in any 
dialog. Select this option to use the dialog interface to generate a command line that 
you expect to refine iteratively. 


Bubble Help. Apart from the help provided on the status bar about each menu item, a 
more detailed description is provided in a "bubble" that appears when you pause the 
mouse on the menu item for a few seconds. You can specify the number of seconds to 
pause the mouse before the help appears, or even turn off the help completely. 


Autocomplete commands. As you type commands in any tab of the Commandspace, 
you will be prompted with the possible command keywords, available data files, or 
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available variables. The data files in the folder specified under Open data in the Global 
Options dialog will be listed if you type "USE ". The available variables in the 
currently opened data file will be listed when you type or select any other command 
keyword followed by a space. This feature is enabled by default. You can turn it off if 
you do not want commands to be autocompleted. 


Perform substitutions specified by TOKEN commands. With this option selected, 
SYSTAT treats the ampersand (&) character as a token indicator. During processing, 
predefined or user-specified values replace every '&' and the text immediately 
following it. Deselect this option to prevent these substitutions. 


Show Cancel dialog to terminate lengthy processing. Whenever processing by 
SYSTAT takes some time before results can be displayed, a Cancel dialog pops up so 
that you can cancel processing. You may want to uncheck this option to avoid 
accidental cancellation of a process. 


Prompt to save all documents while quitting SYSTAT. By default, SYSTAT 
prompts you to save all open documents, including any new unsaved data and 
commands that you may have entered, when you quit the application. You may want to 
uncheck this option when you run the application unattended in the batch mode. 


Link data files to output file. When a SYSTAT output file is saved, the data files are 
linked to the output file. That means you can open an output file saved in a previous 
session and continue working with it provided the underlying data files exist in the 

same path. Uncheck this option if you do not want to use output files across sessions. 
Save command log in output file. When a SYSTAT output file is saved, the command 


log will also be saved with it. That means you can open an output file, saved ina 
previous session, and re-use the commands from that session. Uncheck this option if 


you do not use output files across sessions. 
Output Options 


The Output tab of the Global Options dialog determines the format and content of 
subsequently created output. 
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Generall|_ Data | Output Output Scheme|| Graph | File Locations} 


r Data/Output format ———, r Default font 
” Š s Proportional output: 
Field width: 12 = =) 
is “a Arial w |s ~ 
| Decimal places: 3 Monospaced ouput: 
| | E Exponential notation || [Courier E o g 
janes ae aii s Wrap testin tablesat |15 characters 
Length: Shot |) ji ; 
| Truncate text in tables at: | 45 cheractets 


ps Wide A| | E Organize output based on data file nodes 
ee a y {requires restart of application) 

|| Variable label display: | Label [54| [Z] Display statistical Quick Graphs 

Value label display: [Label [Echo commands in output 


imagetomet. [PNG 08) E1UsPSYSTAT classic upit so 


Data/Output format. These settings control the default display of numeric data in the 
Data Editor and in the output. Field width is the total number of digits in the data value, 
including decimal places. Exponential notation is used to display very small values. 
This is particularly useful for data values that might otherwise appear as 0 in the chosen 
data format, For example, a value of 0.00001 is displayed as 0.000 in the default 12.3 
format but is displayed as 1.00000E-5 in exponential notation. A number that would 
otherwise violate the specified field width will also be converted to exponential 
notation while maintaining the number of decimal places. Individual variable formats 
in the Data Editor override the default setting. 


Default font. You can specify the font used in the output. 
= Proportional output sets the font and font size for the HTML based output. 
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m Monospaced output sets the font and font size for output appearing in the classic 
style, and any output requiring fixed-width font (that facilitates automatic 
alignment of text) like stem-and-leaf diagrams. 


Output results. These settings control the display of the results of your analyses. 


m Length specifies the amount of statistical output that is generated. Short provides 
standard output (the default). Some statistical analyses provide additional results 
when you select Medium or Long. Note that some procedures have no additional 
output. (Tip: In command mode, DISCRIM, LOGLIN, and XTAB allow you to add or 
delete items selectively. Specify PLENGTH NONE and then individually specify the 
items you want to print.) 

m "To control Width, select Narrow (80[77 (82) characters wide in the HTML 
(Classic) format, for a font size of 10],) or Wide (132[106 (113) characters wide in 
the HTML (Classic) format, for a font size of 10]), or None. This applies to screen 
output (how output is saved and printed). The wide setting is useful for data listings 
and correlation matrices when there are more than five variables. Selecting None 
prevents tables from splitting no matter how wide they are. 

= To control Width, select Narrow (80 characters wide) or Wide (132 characters 
wide). This applies to screen output (how output is saved and printed). The wide 
setting is useful for data listings and correlation matrices when there are more than 
five variables. 

Variable label display. If a variable label is defined for a variable, it will be used to 

identify the corresponding variable in the output instead of the variable name itself. 

Select "Both" if you want both variable names and labels to be used, or "Name" if you 

want just the variable names to be used. 


Value label display. If value labels are defined for a variable, they will be used to 
represent the underlying data values in the output. You can select "Both" to display both 
value labels and data values, and "Data" to display just the data values. 


Image format. The graphs created by SYSTAT in the Output Editor are in the 
"portable network graphics (PNG)" format. You can choose this or any one of the 


formats: BMP, JPG, GIF and EMF. 


Wrap text in tables. The text written in tables can be sometimes very long, especially 
when variable and/or value labels are defined. In such cases, by default, in each cell, 
the text will be wrapped into multiple lines if they extend beyond 15 characters. Row 
headers will be wrapped if they extend beyond thrice this number, i.e., 45 characters. 
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You can set a different number here as desired. You can even uncheck this option to 
prevent wrapping. 


Truncate text in tables. Apart from wrapping, the text in tables can also be truncated. 
By default, in each cell, the truncation will happen at 45 characters. You can change 
this number or even turn off truncation. 


Organize output by data nodes. By default, the output generates a new data node 
every time a data file is opened. That also implies that the output is arranged 
chronologically. You can set it to be grouped by data nodes so that all the graphs and 
analyses pertaining to a given data file are placed together. You will have to restart 
SYSTAT for this option to come into effect. 


Display statistical Quick Graphs. You can turn the display of the Quick Graphs on and 
off. By default, SYSTAT automatically displays Quick Graphs. 


Echo commands in output. Includes commands in the Output Editor before the 
subsequent output. 


Use SYSTAT classic output style. Displays all subsequent statistical output as ASCII 
text using the Courier font. With this option selected, no output appears in formatted 
tables. 


Output Scheme. The Output Scheme tab of the Global Options dialog allows you to 
customize the output format in terms of the font color, style (regular or bold) and 
background color of various components of the output (excluding graphs), as well as 
the page background. 
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Edit: Options 


| General Data | Output] Output Scheme Graph | File Locations| 

Echo © — r Sub-header 
Color: ma co a -) 
Background colo: [I~ Background colo: EEM f 
© Regular © Bold © Regular © Bold 
Text =E A Table caption 
Color: a) >| Color: paas] + 
Background colo: EE | Background color: GE -| 
© Regular © Bold © Regular © Bold 
Enor = Table header/footer 
Color: EEA -| Color: ‘a ~ 
Background color. MEM +] Background color: (een aly 
© Regular © Bold O Regular © Bold 
Waming re =) « Table body 

| Color: SE - Color: Best sts} | 
Background colo: MEMME ~ Background color, [aaa ~ 
© Regular © Bold t © Regular © Bold 
Header S = Page background 

| Color ij Eypege colo: (i 

| Background color: ME] f7) Page image: i 

| O Regular © Bold | C:\Documents and Settings\st sf 


Echo. Specify the font color, style and background color of echoed commands. The 
default is a shade of teal, in the regular font style with a white background. 

Text. Specify the font color, style and background color of all text. The default is black 
color, in the regular font style with a white background. 

Error. Specify the font color, style and background color of error messages. The 
default is a crimson color, in the regular font style with a white background. 


Warning. Specify the font color, style and background color of warning messages. 
The default is a shade of brown, in the regular font style with a white background. 


Header. Specify the font color, style and background color of text headings. The 
default is a shade of blue, in the bold font style with a white background. 


Sub-header. Specify the font color, style and background color of text sub-headings. 
The default is a shade of blue, in the bold font style with a white background. 
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Table caption. Specify the font color, style and background color of table captions. 
The default is a shade of blue, in the bold font style with a white background. 


Table header/footer. Specify the font color, style and background color of the text in 
table headers and footers. The default is black color, in the bold font style with an off- 
white background. 


Table body. Specify the font color, style and background color of the text in table body, 
The default is black color, in the bold font style with a white background. 


Page background. Specify the background color and/or image for the entire page. 
The image should be stored in the PNG, BMP, JPG, GIF or EMF format, and can be in 
any location. 


Color Palette 


To change a color, click the corresponding color button, click on a pre-defined color in 
the color palette, or create your own color by clicking More colors. Clicking this opens 
the Color dialog. 


Basic colors: 
ee ee Fe 
Bf, See ee 
BE BRR eee 
BEB eee ee 


it ee 


Custom colors: 
CIETE 
CETT 


4 


Sat |0 
Defi e ooh! Meas 


Basic colors. Click one of the basic colors and press OK to use that color. 


Redo | 
0 
0 


Custom colors. Click a basic color to begin with. It shows up in the Color|Solid area, 
with the cross-hair at the corresponding point in the full color spectrum above it, and 
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an arrow at the corresponding point in the color bar beside the spectrum. You can move 
the cross-hair to any point in the full spectrum, and slide the arrow to any height in the 
color bar. You can also enter hue, saturation, luminosity, and RGB values. Press Add 
to Custom Colors to add the color to the Custom color palette. You can create any 

number of colors in this way. Finally, click on a color and press OK to use that color. 


File Locations 


Use the File Locations tab to specify the folder containing the files used in the Graph 
Gallery, to designate file paths to append to filenames used in SYSTAT commands, and 
define paths to store command, graph and output files. 


Set project directory. Resets file paths for all file types to the appropriate sub-folders 
within the designated folder. Check Use common directory if you want all subsequent 
file opening and saving to occur directly within this folder. 


Set custom directories. As an alternative to specifying a project directory, you can 

specify individual folders based on file type or file operation. 

m Graph Gallery. Specify the folder containing the command files and graphics used to 
generate the Graph Gallery. 

m Open data. Sets the folder used for opening all SYSTAT data files (‘SYZ 
and .SYS). When opening data files using the menus, the Open dialog initially 
defaults to this folder. This is set to the SYSTAT Data folder at the time of 
installation. 

m Save data. Defines the folder used for saving all SYSTAT data files (SYZ). When 
saving data files using the menus, the Save As dialog initially defaults to this 
folder. Ifa USE command is issued without a path, SYSTAT also looks for the file 
in this folder. This is set to the SYSTAT Data folder at the time of installation. 

m Work data. Sets the folder used for saving all temporary data files (.SYZ). Ifa USE 
command is issued without a path, SYSTAT also looks for the file in this folder. 
This is set to the Windows temporary folder at the time of installation. 


Import data. Identifies the folder used for all data file importing. 
m Export data. Identifies the folder used for all data file exporting. 


Command files. Sets the folder used for opening and saving of SYSTAT command 
files. When opening or saving command files using the menus, the dialogs initially 
default to this folder. This is set to the SYSTAT Command folder at the time of 


installation. 
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m Output files. Associates the designated folder with all SYSTAT (.SYO) as well as 
HTML (.MHT) output files (.SYO). When opening or saving output files using the 
menus, the dialogs initially default to this folder. 


= ASCII output files. Sets the folder used for saving ASCII output files (DAT) 
created using the OUTPUT command. 


m Export graphs. Identifies the folder used for saving all graphic formats. 


Basic GET. Defines the folder used for reading ASCII files (.DAT) using the GET 
command. 


m Basic PUT. Defines the folder used for writing ASCII files (DAT) using the PUT 
command. 


Using Commands 


Among the general options, use TOKEN/ON or OFF to switch token substitution on or 
off. 


The following commands specify global output display options: 


Table 2: 


FORMAT m,n / UNDERFLOW Indicates the format for numeric output. 
PLENGTH SHORT 


MEDIUM Defines the length of statistical output. 
LONG 


PAGE NARR A 5 
TRga H Indicates the width of the output. 
VDISPLAY LABEL 


NAME Defines the use of variable labels in the output. 
BOTH 


LDISPLAY LABEL 


NAME Defines the use of value labels in the output. 
BOTH 
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ECHO ON 
OFF 


CLASSIC ON 
OFF 


FPATH path /PROJECT or 
filetype 
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Table 2: 


Includes Quick Graphs generated by statistical procedures 
in the output. Use GRAPH NONE to suppress Quick 
Graphs. 


Indicates whether to echo commands in the output or not. 


Controls the appearance of statistical results. 


Specifies a path prefix to append to filenames. If path is 
not specified, all file locations are set to the program 
folder. If no option is specified, all directories are set to 
the specified path. PROJECT will set path as the root 
directory under which sub-folders Gallery, Data, Com- 
mand, and Output will be created. 


For the filetype in the FPATH statement, specify one of the following: GALLERY, USE, 
SAVE, WORK, IMPORT, EXPORT, SUBMIT, OSAVE, OUTPUT, GSAVE, GET and PUT. 
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Applications 


SYSTAT offers applications in the following fields: 
m Anthropology 


Astronomy 
Biology 
Chemistry 
Engineering 
Environmental Sciences 
Genetics 
Manufacturing 
Medical Research 
Psychology 
Sociology 
Statistics 
Toxicology 


You can find these applications in the online Help. Use the Contents tab of the Help 
system to access the Application Gallery. In the gallery, you will find sample analyses 


with their associated commands and menu se! 


Jections. All relevant data and command 


files are included. 
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Anthropology 


Egyptian Skulls Data 


EGYPTDM data consists of four measurements of male Egyptian skulls from five 
different time periods ranging from 4000 B.C. to 150 A.D. 


Variable Description 
MB, BH, BL, NH Skull measurements 
YEAR Year of measurement 


The data can be analyzed to determine if there are any changes in the skull sizes 
between the time periods. The researchers theorize that a change in skull size over time 
is evidence of the interbreeding of the Egyptians with immigrant populations over the 
years. Because there are four different measurements that characterize skull size, 
multivariate techniques that allow multiple dependent variables can be used. 
Dependent variables are the measurements MB, BH, BL, and NH. The predictor 
variable is YEAR. Assuming that YEAR is a discrete predictor variable, then data can 
be analyzed using MANOVA. Assuming that there is a linear trend to the change in skull 
size, then YEAR can be treated as a continuous predictor variable. 

Potential analyses include MANOVA, regression, and principal components. 


Box Plot and Regression 


The input is: 


USE EGYPTDM 
THICK 2.5 
BEGIN 
DENSITY MB BL*YEAR/BOX, FCOLOR=1, FILL=1, XMAX=1000, 
XMIN=-5000, COLOR=3, 11, HEIGHT=5.5, WIDTH=4, 
XTIC=4, 
TITLE='Variation of Skull Measurements by Period’ 
PLOT MB BL * YEAR / SMOOTH=LINEAR, SIZE=0, XMAX=1000, 
XMIN=-5000, XTIC=4, COLOR=4, HEIGHT=5.5, 
WIDTH=4 
END 
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The output is: 


Variation of Skull Measurements by Period 


MANOVA 


The input is: 


PLENGTH SHORT 
USE EGYPTDM 


MANOVA 
MODEL MB BH BL NH = CONSTANT + YEAR 


ESTIMATE 


The output is: 
N of Cases Processed : 150 


Dependent Variable Means 


MB BH NH 
-133.973 132.547 50.933 
Regression Coefficients B = (x'x) xX 
Factor l MB BH BL NH 
noone oa Ge al 81.502 
YEAR i 0.001 -0.001 -0.001 0.000 


Information Criteria 


AIC | 3468.115 
AIC (Corrected) 1 3473.336 
Schwarz's BIC | 3522.306 
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Multiple Correlations 
0.371 0.181 0.425 0.170 

Adjusted R = 1-(1-R® )*(N-1)/df, where N = 150, and df = 148 
Adjusted R 


0,132 0.026 0.175 0.022 


Plot of Residuals vs Predicted Values 


()rwnaissy 


(2)wnaisay 


RESIDUAL(3) _RESIDUAL(2) _ RESIDUAL(1) 


_— 
(e)iwnaissy 


RESIDUAL (4) 
(py wnaissy 
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Astronomy 


Sunspot Cycles 


SUNSPTDM data consists of a calculated relative measure of the daily number of 
sunspots compiled from the observations of a number of different observatories. 


Variables Description 

YEAR The year the observations were made 

JAN-DEC The relative measure of sunspots for the indicated month 
ANNUAL The mean relative measure of sunspots for the entire year 


Sunspots exhibit cyclical behavior on a 10 to 11 year cycle. These cycles have 
potentially important effects on the earth’s ecosystem, including weather and the 
growth and development of living organisms. Understanding the natural causes and 
effects of sunspot behavior are all important areas of scientific exploration. 
Potential analyses include Time Series (smoothing, autocorrelation, Fourier 
analysis, ARIMA, etc.) and Descriptive Statistics (variance and distribution). 


Autocorrelation Plot 


The input is: 
USE SUNSPTDM 
SERIES 
ACF ANNUAL 
The output is: 
Autocorrelation Plot 
q + ~ — r 
05 kz | 
c 
S 
3 0.0 | 
8 
05 a 
, , Ni , 1 
10 20 30 40 50 
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Biology 


Mortality Rates of Mediterranean Fruit Flies 


The FRTFLYDM data contains information on mortality rates for Mediterranean fruit 
flies over 172 days, after which all flies died. Experimenters recorded the number of 
flies dying each day and divided this by the number alive at the beginning of the day 
to measure the mortality rate for each day. 


Variable Description 
DAY Day number 
LIVING Number of fruit flies alive at the beginning of the day 


MORTRATE Mortality rate of the fruit flies for each day 


The Mediterranean fruit fly data can be used to determine the functional form of 
mortality rate as a function of time. A scatterplot of these two variables suggests that 
mortality rate might be a cubic function of time. Since the number of fruit flies alive is 
directly determined by these two variables, the mortality rate function can be 
substituted into an equation for the number of fruit flies living as a function of time 
(which appears to be exponentially decreasing) to estimate parameters for the 
nonlinear model. 

Potential analyses include nonlinear modeling, linear regression, and 
transformations. 


Nonlinear Modeling Showing an Exponential Decline in Fruit Flies Over Time 


The input is: 


USE FRTFLYDM 

NONLIN 
MODEL LIVING = 1203646*exp (-(A+B*DAY+C*DAY*2) *DAY) 
ESTIMATE / ITER=50 
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The output is: 


Iteration History 


0 | 1.541B+013 
1 | 1.508E+013 
2 | 1.468E+013 
3 | 1.416E+013 
4 | 1.411E+013 
5 | 1.411B+013 
6 | 1.411E+013 
7 | 1,410E+013 
8 | 1.410E+013 
9 | 1.410B+013 
10 | 1.410E+013 
11 | 1.410E+013 
12 | 1.410E+013 
13 | 1.410B+013 
14 | 1.127E+013 
15 | 7.117E+012 
16 | 4.213E+012 
17 | 5.111B+011 
18 | 1.621E+011 
19 | 2.562E+010 
20 | 2.282E+010 
21 | 2.228E+010 
22 | 2.164E+010 
23 | 1,384E+010 
24 | 1.309E+010 
25 | 1.305E+010 
26 | 1.305E+010 
27 | 1.305E+010 
28 | 1.305E+010 
29 | 1.305E+010 
30 | 1.305E+010 
31 | 1.305E+010 


Dependent Variable 


0.010 
-0.016 
-0.041 
-0.064 
-0.066 
-0.066 
-0.066 
-0.066 
-0.066 
-0.066 
-0.066 
-0.066 
-0.066 
-0.066 

0.006 

0.049 

0.053 

0.015 
-0.004 
-0.021 
-0.021 
-0.021 
-0.021 
-0.015 
-0.013 
-0.013 
-0.013 
-0.013 
-0.013 
-0.013 
-0.013 
-0.013 


Sum of Squares and Mean Squares 


Source 

Regression i 

Residual H 
i 
i 
i 
i 


Total 
Mean corrected 


R-squares 


Raw R-square (1-Residual/To' 
Mean Corrected R-square 


2.363E+013 
1.305E+010 
2.364E+013 
1.983E+013 


df Mean Squares 

3 7,877E+012 

170 76738341.153 
173 
172 

tal) : 

Residual/Corrected) : 


(1- 


R-square (Observed vs Predicted) 


0.999 
0.999 


: 0.999 
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Parameter Estimates 
i Wald 95% Confidence Interval 
Parameter | Estimate ASE Parameter/ASE Lower Upper 
Bionenrona i a E E E A 
A i -0.013 0.001 -14.165 -0.014 -0.011 
B i 0.002 0.000 21.259 0.002 0.002 
c I 0.000 0.000 4.773 0.000 0.000 
Asymptotic Correlation Matrix of Parameters 
i A B c 
R eee ae a aaa aor 
A} 1.000 
B | -0.952 1.000 
C | 0.866 -0.971 1.000 
Scatter Plot 
[o] 
Z 
2 
a 
50 100 150 
DAY 
Scatterplot 
The input is: 


USE FRTFLYDM 

PLOT LIVING*DAY*MORTRATE/AX=CORNER, FILL, FCOLOR=GRAY, 
COLOR=RED, XLAB='Number of Flies Living', 
YLAB='Days Passed', ZLAB='Mortality Rate', 
XGRID, YGRID, ZGRID, 
TITLE='Fruit Fly Mortality Rates Over Time' 
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The output is: 


Fruit Fly Mortality Rates Over Time 


Number of Flies Living 


Animal Predatory Danger 


Applications 


SLEEPDM data contains information from a study on the effects of physical and 


biological characteristics and sle 
eaten by predators. The study inc 


sleep, gestation age, and body and brain weight for 62 mammals. 


Variable 
SPECIES$ 
BODY 
BRAIN 
SLO_SLP 
DREAM_SLP 
TOTAL_SLEEP 
LIFE 
GESTATE 
PREDATION 
EXPOSURE 


Description 

Type of species 

Body weight of the mammal in kg 

Brain weight of the mammal in g 

Number of hours of non-dreaming sleep 
Number of hours of dreaming sleep 
Number of hours of total sleep 

The life span in years 
The gestation age 
Index of predation as a quantitative variable 
Index of exposure as a quantitative variable 


ep patterns influencing the danger of a mammal being 
ludes data on the hours of dreaming and nondreaming 
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The danger faced by mammals may be due to the environment they are in or their 
biological and physical characteristics. These studies are used to assess whether 
physical and biological attributes in mammals play a significant role in determining the 
predatory danger faced by mammals. 

Potential analyses include regression trees, multiple regression, and discriminant 
analysis. 


Regression Tree with DIT Plots 


The input is: 


USE SLEEPDM 

TREES 
MODEL DANGER=BODY, BRAIN, SLO_SLP, DREAM_SLP, GESTATE 
ESTIMATE / DENSITY=DIT 


The output is: 
18 Cases Deleted due to Missing Data. 
split Variable PRE Improvement 
1 DREAM_SLP 0.404 0.404 
2 BODY 0.479 0.074 
3 SLO_SLP 0.547 0.068 
Fitting Method : Least Squares 
predicted Variable : DANGER 
Minimum Split Index Value : 0.050 
Minimum Improvement in PRE : 0.050 
Maximum Number of Nodes Allowed 7 21 
Minimum Count Allowed in Each Node 2/5 
Number of Terminal Nodes in Final Tree : 4 
Proportional Reduction in Error (PRE) : 0.547 
Split variable Cut Value Fit 
44 659 -380 DREAM_SLP 1.200 0.404 
14 929 -072 BODY 4.190 0.408 
30 067 -081 SLO_SLP 12.800 0,164 


WWNNRPHO! 

a 
BNBWNWN 

H 

a 

1 
CrPORERE 

H 

a 

© 
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Applications 


Decision Tree 


aa T E a 


Chemistry 


Enzyme Reaction Velocity 


ENZYMDM data consists of measurements of an enzymatic reaction measuring the 
effects of an inhibitor on the reaction velocity of an enzyme and substrate. 


Variable Description 

VELOCITY Reaction velocity 

SUB_CONC Substrate concentration 

INH_CONC Inhibitor concentration 

Understanding how reaction rates depend on the various reaction conditions is critical 


f a reaction. Also, the functional form of the rate on reaction 


to optimizing the yield o; 
st of the theoretical models used to interpret a chemical 


parameters serves as a te: 


reaction. 
Potential analyses include nonlinear modeling, bootstrapping, and smoothing. 
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Estimation using Bootstrap Method 


The input is: 


USE ENZYMDM 
NONLIN 
MODEL VELOCITY =VMAX*SUB_CONC/ (KM* (1+INH_CONC/KIS) +SUB_CONC) 


ESTIMATE / SAMPLE=BOOT (100) 


Next, the ESTIM file is used to draw the density plots. EST/M contains the estimated 
parameters for each sample. 
USE ESTIM 


CBSTAT / MEAN, SD, SEM 
DENSITY VMAX, KM, KIS 


The output is: 


Arithmetic Mean i 
Standard Error of Arithmetic Mean | 0.001 0.003 0.000 
Standard Deviation i 


A i 
ği 3 
< 20] 0.2 
‘ i 
w ug 
' 
KM 


A ‘| 
; H 
8 

“ak Ka 


0.025 0.026 0.027 0.028 0.029 0.030 
KS 
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Nonlinear Analysis 


The input is: 


USE ENZYMDM 

NONLIN 
MODEL VELOCITY=VMAX* SUB_CONC/ (KM* (1+INH_CONC/KIS) +SUB_CONC) 
ESTIMATE 


The output is: 


Iteration History 


No. | Loss VMAX KM KIS 


0 | 3.568 1.010 1.020 1.030 
1 | 3.192 1.009 0.988 0.651 
2 | 2.897 1.011 0.961 0.481 
3 | 0.772 1.021 0.873 0.075 
4 i 0.154 1.134 0.845 0.029 
5 į 0.014 1.260 0.847 0.027 
6 ; 0.014 1.259 0.847 0.027 
7 | 0.014 1.260 0.847 0.027 
8 | 0.014 1.260 0.847 0.027 
Dependent Variable : VELOCITY 


Sum of Squares and Mean Squares 
ss df Mean Squares 


Source f 

mea She E pane nen denen eno nanan a------ 
Regression į 15.404 3 5.135 

Residual I 0.014 43 0.000 

Total } 15.418 46 

Mean corrected | 5.763 45 

R-squares 

Raw R-square (1-Residual/Total) 3: 0.999 


Mean Corrected R-square (1-Residual/Corrected) : 0.998 
R-square (Observed vs Predicted) : 0.998 


Parameter Estimates 


Wald 95% Confidence Interval 
Parameter /ASE Uj 


Parameter 


w 104.191 
0.027 31.876 0.793 
0.001 31.033 0.025 
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Nonlinear Graph 
is 
[a] 
fe} 
a 
> 
DWLS Smoother 
The input is: 
USE ENZYMDM 
CSIZE 1.3 
THICK 1.7 
BEGIN 
PLOT VELOCITY*INH_CONC*SUB_CONC /SIZE=0, SMOOTH=DWLS, 
TENSION=0.500,TITLE='', XLABEL='', YLABEL='', 
ZLABEL='', AXES=CORNER, ACOLOR=BLACK, YGRID, 
ZGRID,FCOLOR =gray, ZMAX =1.1, 
HEIGHT=3.75,WIDTH=3.75, ALTITUDE = 3.75 
FACET XY 
PLOT VELOCITY*INH_CONC*SUB_CONC /SIZE=0, SMOOTH=DWLS, 
-500,TITLE='', XLABEL='', YLABEL='', 
AXES=no, SC=no, legend=no, FCOLOR= white, 
ZMAX = 1.1, tile,HEIGHT=3.75,WIDTH=3.75, 
ALTITUDE = 3.75 
FACET 
PLOT VELOCITY*INH_CONC*SUB_CONC / SIZE=0,SMOOTH=DWLS, 
TENSION =0.500, TITLE='', XLABEL='', YLABEL='', 
ZLABEL='', 


ZMAX = 1.1,HEIGHT=3.75,WIDTH=3.75, 
ALTITUDE = 3.75 
PLOT VELOCITY*INH_CONC*SUB_CONC /SIZE=0,SMOOTH=DWLS, 
SURF=XYCUT, TENSION =0.500, TITLE='', XLABEL='', 
YLABEL='', ZLABEL='',ZMAX =1.1, 
HEIGHT=3 .75,WIDTH=3.75, 
ALTITUDE = 3.75 
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PLOT VELOCITY*INH_CONC*SUB_CONC/ COLOR=11,FILL=1,SIZE=1.3, 


TITLE='Enzyme Reaction Velocity by Concentration', 
XLABEL= 'Substrate Concentration', 
YLABEL= ‘Inhibitor Concentration’, 
ZLABEL= ‘Reaction Velocity', 
ZMAX = 1.1,HEIGHT=3.75,WIDTH=3.75, 
ALTITUDE = 3.75 

PLOT VELOCITY*INH_CONC*SUB_CONC / COLOR=2, FILL=0,SIZE=1.3, 

TITLE= 'Enzyme Reaction Velocity by 


Concentration', XLABEL= 'Substrate Concentration’, 
YLABEL= ‘Inhibitor Concentration', 


ZLABEL= 'Reaction Velocity', 
ZMAX = 1.1,HEIGHT=3.75,WIDTH=3.75, 
ALTITUDE = 3.75 
END 
THICK 1 
CSIZE 1 
The output is: 


Enzyme Reaction Velocity by Concentration 


Reaction Velocity 
o 
e 


oe 
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Engineering 


Robust Design - Design of Experiments 


DESIGNDM data consists of the results of a designed experiment to improve the 
performance of a fuel gauge. 


Variable Description 

RUN The case ID 

SPRING Dummy variable for the type of spring used 

POINTER Dummy variable for the type of pointer used 

VENDOR Dummy variable for the vendor used 

ANGLE Dummy variable for the type of angle bracket used 
READING The reading of the fuel gauge under the designed conditions 


This example is a demonstration of the use of Design of Experiments (DOE) in the 
product development process. A four-factor, two-level fractional design is used to 
minimize the data collection needed to analyze the factors affecting the performance 
of a fuel gauge: SPRING, POINTER, VENDOR, and ANGLE. 


ANOVA 


The input is: 


USE DESIGNDM 

ANOVA 
CATEGORY SPRING / REPLACE 
DEPEND READING 
ESTIMATE 

ANOVA 
CATEGORY POINTER / REPLACE 
DEPEND READING 
ESTIMATE 

ANOVA 
CATEGORY VENDOR / REPLACE 
DEPEND READING 
ESTIMATE 

ANOVA 
CATEGORY ANGLE / REPLACE 
DEPEND READING 
ESTIMATE 
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The output is: 


Applications 


Effects coding used for categorical variables in model. 
Categorical values encountered during processing are 


Variables J 


SPRING (2 levels) 


Dependent Variable | READING 
N H 16 
Multiple R 1 0.386 
Squared Multiple R | 0.149 


Analysis of Variance 


Source | Type III SS df Mean Squares 
SPRING | 25.000 1 25.000 
Error | 143.000 14 10.214 
Least Squares Means 
— = 
13+ 4 
onp 5 
zZ 
3 
æ 9F 7 
7L al 
nE 1 
-1 1 
ANGLE 


Durbin-Watson D Statistic 


First Order Autocorrelation 


| 1.103 
| 0.404 


F-ratio p-value 


2.448 


Effects coding used for categorical variables in model. 


Categor. 
Variables i Levels 
“POINTER (2 levels) | -1.000 1.000 


Dependent Variable | READING 


ical values encountered during processing are 
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N H 16 
Multiple R i 0.000 
Squared Multiple R | 0.000 
Analysis of Variance 
Source | Type III SS df F-ratio -value 
POINTER | 0.000 1 0.000 1.000 
Error H 168.000 14 
Least Squares Means 
T 
np ei 
o 
Z 
3 
[4 
9 4 
4 = 
4 1 
POINTER 
*** WARNING *** ; 
Case 11 is an Outlier (Studentized Residual : 2.839) 
Durbin-Watson D Statistic | 1.512 
First Order Autocorrelation | 0.201 


Effects coding used for categorical variables in model. 
Categorical values encountered during processing are 


Variables i Levels 
“VENDOR (2 levels) | -1.000 1.000 
Dependent Variable | READING 

maltibie R 0.290 

Squared Multiple R | 0.073 


Analysis of Variance 


Source | Type III SS df 
ae-secse fh fee ae oe eae 
VENDOR | 12.250 1 
Error | 155.750 14 


Mean Squares 


F-ratio 


12.250 
11.125 


1.101 


0.312 
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Least Squares Means 


T 


Ñ 
ae | 
1 


READING 
è 
T 
1 


o 
crey 


Durbin-Watson D Statistic | 1.645 
First Order Autocorrelation | 0,137 


Effects coding used for categorical variables in model. 
Categorical values encountered during processing are 


Variables Levels 
ANGLE (2 levels) -1.000 1.000 
Dependent Variable | READING 
N i 16 
Multiple R 1 0.463 

+ 0.214 


Squared Multiple R 


Analysis of Variance 


Source | Type III SS af Mean Squares F-ratio p-value 


{ 36.000 1 36.000 3.818 0.071 
Error | 132.000 14 9.429 
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Least Squares Means 


Br -+ 


READING 
5 
T 
L 


Durbin-Watson D Statistic { 
First Order Autocorrelation | 


Creating the Four Factor, Two Level Design Matrix 


The input is: 
DESIGN 
SAVE XDESIGN 
FACTORIAL / LEVELS=2 FACTORS=4 REPS=1 
Once the design matrix is created, the following steps complete the DOE process: 
m Assigning variable names 
m Assigning factor level labels 


Collecting and entering data 


Performing analyses 
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output is: 


C:AXDESIGN. syz 


1.000 TEET, 


-1.000 -1.000 
-1.000 -1.000 
-1.000 -4.000 
4.000 -1.000 
1.000 -1.000 
1.000 ~ 1,000 
1.000 -1.000 
-1.000 1.000 
-1.000 1.000 
-1.000 1.000 
-1.000 1.000 
1.000 1.000 
1.000 1.000 
1.000 1.000 


1.000 1.000 jg) 
S 


Data | Variable 


Dot Plots 


The 


The 


input is: 
USE DESIGNDM 
CATEGORY SPRING POINTER VENDOR ANGLE 


THICK 6 


CSIZE 2 
DOT READING*SPRING POINTER VENDOR ANGLE/LINE, SERROR=.95, 
COLOR = 1, FCOLOR = 2, 


TITLE = 'Fuel Gauge Designed Experiment Results' 


CSIZE 1 
THICK 1 


following plots assume that we have collected data in accordance with a generated 


experimental design. 
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The output is: 


Fuel Gauge Designed Experiment Results 


© 
Z 
fa] 
a 
[4 


READING 


READING 


a 1 
VENDOR 


Environmental Science 


Mercury Levels in Freshwater Fish 


The MRCURYDM data consists of measurements of largemouth bass in 53 different 
Florida lakes to examine the factors that influence the level of mercury contamination. 
The pH level, amount of chlorophyll, calcium, and alkalinity were measured from 
water samples that were collected. The age of each fish and the mercury concentration 
in the muscle tissue were measured (older fish tend to have higher concentrations) 
from a sample of fish taken from each lake. To make a fair comparison of the fish in 
different lakes, the investigators used a regression estimate of the expected mercury 
concentration in a three-year-old fish as the standardized value for each lake. Finally, 
in 10 of the 53 lakes, the age of the individual fish could not be determined and the 
average mercury concentration of the sampled fish was used. 
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Variable 
ID 

LAKE$ 
ALKLNTY 
PH 
CALCIUM 
CHLORO 
AVGMERC 


SAMPLES 
MIN 

MAX 
STDMERC 


AGEDATA 
LNCHLORO 


Applications 


Description 

Lake ID 

Lake name 

Measured alkalinity of the lake (mg/L as Calcium Carbonate) 
Measured PH of the lake 

Measured Calcium of the lake (mg/l) 

Measured Chlorophyll of the lake (mg/l) 


Average mercury concentration (parts per million) in the tissue of the fish 
sampled from the lake 


Number of fish sampled in the lake 

Minimum mercury concentration in sampled fish from lake 
Maximum mercury concentration in sampled fish from lake 
Regression estimate of the mercury concentration in a 3-year-old fish 
from the lake 

Indicator of the availability of age data on fish sampled 


Log of CHLORO 


Mercury is a toxic element. Its presence in the environment arises from pollution, and 
it subsequently becomes part of the food chain, creating potentially harmful effects for 
both animals and humans. Understanding the level and causes of contamination of the 
environment by such pollutants is an important problem in environmental science. 

Potential analyses include descriptive statistics (variance and distribution), 
transformations, correlation and regression. 


Regression of Standard Mercury Level on Lake Alkalinity 


The input is: 


USE MRCURYDM 
PLOT STDMERC*ALKLNTY/ELL, SMOOTH=LINEAR, BORDER=DOX, 


FILL=1, XLAB= ‘Alkalinity', YLAB= ‘Mercury’, 


TITLE='Measured Mercury Levels in 


COLOR=3, FCOLOR=2 


Freshwater Fish vs Alkalinity', 
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The output is: 
Measured Mercury Levels in Freshwater Fish vs. Alkalinity 
C Coi CoC ey 
20 
$ 
È | 
EET + 
2 ; 
S 
e 
= 
z 
es 
+ 
50 100 150 


Alkalinity 


The Graph Window can be used to transform both the Alkalinity and Standard Mercury 
variables so that they meet the assumptions of linear regression. 


The graph below has X-Power=0.7; Y-Power=0.4 
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Measured Mercury Levels in Freshwater Fish vs. Alkalinity 


PAUE E ee 


s 


HEHH 


Genetics 


Bayesian Estimation of Gene Frequency 


Note: This example will work with the Monte-Carlo add-on module version 1. 


Rao (1973) illustrated maximum likelihood estimation of gene frequencies of O, A and 
B blood groups through the method of scoring. McLachlan and Krishnan (1997) used 
the EM algorithm for the same problem. This application illustrates Bayesian 
estimation of these gene frequencies by the Gibbs Sampling method. 


Consider the following multinomial model with four cell frequencies and their 
probabilities with parameters p, 7, andr withp +q +r=]1. 
Letn =nọ + ng + ng t nap- 


Data Model 
no 176 
ny 182 
ng 60 

17 


"4B 
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Let us consider a hypothetical augmented data for this problem to be ng, n44 240, "BB 
ngo, nag With a multinomial model {n; (1-p-q’, Dy. 2p(1-p-q), g. 2q(1-p-q), 2pq}. 
With respect to the latter full model, 744, ngg could be considered as missing data. 
MODEL: 

X~ Multinomials (435; (1-p-)°, p°, 2p(1-p-4), 7, 24(1-p-4), 2pq) 
Prior information: 


(p, q, r) ~ Dirichlet (a, p, y) 


The full conditional densities take the form: 


z 
Y : P 
n,, ~ Binomial| n,, = 
is | ‘ Saeed 
g 
~ Binomial| n,, —————————— 
Ngp nomial iz an) 


p~(1—q)Beta(2n,,+ Ny + Nyy + @, 2noo +N4o + Ngo +7) 
q~(1- p)Beta(2npy + Ngo + Ny, +B, 2noo + N49 + Ngo +7) 
For generating random samples from p and q, the generated value from the beta 


distribution is to be multiplied with (7-q) and (1-p) respectively. Since it is not possible 
in our system to implement this, let us consider: 


p~ Beta(2n,,+nyo +n + @, 2Ng9 +49 + Ngo +y) 
q ~ Beta(2ngy + Ngo +n +f, 2N¢0 +149 +Ngq +7) 


and, whenever p and q appear in other full conditionals, p is replaced by (I-q) p and q 
is replaced by (1-p) q. Take a=2, B=2 and y=2. 
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Gene Frequency Estimation using Gibbs Sampling 


The input is: 


FORMAT 10 5 
MCMC 

NAA=40 

NBB=5 

P=0.1 

Q=0.5 

N1=182 

N2=60 

GVAR NAA,NBB, P,Q 
FUNCTION FC1() 


P1=( ((1-Q) *P)*2) /((((1-Q) *P) *2) + (2* ((1-Q) *P) *(1-, 
((1-Q) *P) - ((1-P) *Q)))) 
NAA=NRN (N1, P1) 


} 
FUNCTION FC2() 


P2=(((1-P)*Q)^2)/((((1-P)*Q)^2)+(2*((1-P)*Q)*(1- 
((1-P)*Q)-((1-Q)*P)))) 
NBB= NRN (N2, P2) 


} 
FUNCTION FC3() 


B1=NAA+182+17+1 
B2=(2*176) +182+60-NAA-NBB+1 
P=BRN (B1, B2) 


} 
FUNCTION FC4() 


{ 
D1=NBB+60+17+1 
D2=(2*176)+182+60-NAA-NBB+1 


Q= BRN (D1,D2) 


Mi 
SAVE GIBBSGENETIC 
GIBBS (FC1() ,FC2() ,FC3() ,FC4()) / SIZE=10000 NSAMP=1 


BURNIN=1000 GAP=1 RSEED=1783 


USE GIBBSGENETIC 
LET PP=(1-Q1) *P1 
LET QQ=(1-P1) *Q1 
LET RR=1-PP-QQ 


310 


Chapter 8 


FORMAT 10 5 

LET RBEP=(1-QQ) *( (NAA1+182+17+2) /( (NAA1+182+17+2)+, 
( (2*176) +182+60-NAA1-NBB1+2) ) ) 

LET RBEQ=(1-PP) *( (NBB1+60+17+2) / ( (NBB1+60+17+2)+, 
((2*176) +182+60-NAA1-NBB1+2) ) ) 

LET RBER=1-RBEP-RBEQ 

STATISTICS PP QQ RR RBEP RBEQ RBER/ MAXIMUM MEAN, MEDIAN 

MINIMUM SD VARIANCE N PTILE=2.5 50 97.5 

BEGIN 

DENSITY PP RBEP/HIST XMIN=0.20 XMAX=0.35 LOC=0,0 

DENSITY QQ RBEQ/HIST XMIN=0.05 XMAX=0.13 LOC=0, -3 


DENSITY RR RBER/HIST -75 LOC=0, -6 
END 
FORMAT 
The output is: 
| PP oQ RR RBEP RBEQ 
TE Cee oS 1 EE ne ae E E i AE E ea EL MP 
N of Cases i 10000 10000 10000 10000 
Minimum | 0.06190 0.60542 0.24135 0.08475 
Maximum i 0.12549 0.70509 0.28952 0.10793 
Median H 0.08980 0.65584 0.26456 0.09552 
Arithmetic Mean H 0.09003 0.65589 0.26470 0.09564 
Standard Deviation | 0.00944 0.01469 0.00700 0.00296 
variance | 0.00009 0.00022 0.00005 0.00001 
Method = CLEVELAND | 
2.50000% H 0.07230 0.62616 0.25075 0.09009 
50.00000% i 0.08980 0.65584 0.26456 0.09552 
97.50000% i 0.10883 0.68477 0.27891 0.10165 
N of Cases 
Minimum 
Maximum 
Median 


Arithmetic Mean 

Standard Deviation 

Variance 

Method = CLEVELAND 
2.50000% 
50.00000% 
97.50000% 


E S S 
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Maximum likelihood estimates of p, q and r evaluated by the scoring method or the 
EM algorithm are 0.26444, 0.09317 and 0.64239. With the available prior information, 
the estimates of p, q and r are approximated by the Gibbs Sampling method. The 
empirical estimates of p, q and r are 0.25407, 0.09003 and 0.65589 respectively, Rao- 
Blackwellized estimates are 0.26470, 0.09564 and 0.63966 respectively. 
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Manufacturing 
Quality Control 
The BOXES data consists of daily measurements of five randomly selected computer 
components. 
Variable Description 
DAY The day the sample was taken 
SAMPLE The sample number for the day (1-5) 
OHMS The resistance of the component in ohms 


Quality control charts are used regularly in manufacturing environments to keep track 
of manufacturing processes, diagnose problems, and improve operations. 

Potential analyses include descriptive statistics, quality control charts, ANOVA, and 
time series. 


R Chart of Ohms vs Days 


The input is: 
USE BOXES 
Qc 
SHEWHART OHMS*DAY / TYPE=R PLIMITS = .025,.975 
The output is: 
Number of Lines of Input Data Read : 100.000 
Number with Missing Data or Zero Weight : 0.000 
Number of Samples to be Plotted : 20.000 
(Only Subgroups Containing Data are Plotted) 
Estimated Population Mean : -19.931 
Estimated Population Standard Deviation : 0.907 


Total N (Excluding Missing Data) : 100 
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R Chart for OHMS with Alpha = 0.05 


vasaa 


Crta = 210% 


X-bar Chart of Ohms vs Days 


The input is: 


USE BOXES 


ac 
SHEWHART OHMS*DAY / TYPE=XBAR 


The output is: 


Number of Lines of Input Data Read 

Number with Missing Data or Zero Weight 
Number of Samples to be Plotted 

(Only Subgroups Containing Data are Plotted) 
Estimated Population Mean 

Estimated Population Standard Deviation 
Total N (Excluding Missing Data) 


Applications 
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X-BAR Chart for OHMS with Alpha = 0.0027 
T T T T 
Sree eee eek iaa 
AH 
fob er 
= 
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= 3 L 1 L 
5 10 15 20 
DAY 
Medical Research 


Clinical Trials 


The CANCERDM data set contains information from a study of the effects of 
supplemental Vitamin C as part of routine cancer treatment for 100 patients and 1000 
controls (that is, 10 controls for each patient). 


Variable Description 

CASE Case ID 

ORGANS Organ affected by cancer 

SEX$ Sex of patient 

AGE Age of the patient 

SURVATD Survival of patient measured from first hospital attendance 
CNTLATD Survival of control group from first hospital attendance 


SURVUNTR Survival of patient from time cancer deemed un-treatable 
CNTLUNTR Survival of control from time cancer deemed untreatable 
LOGSURVA Logarithm of SURVATD 

LOGCNTLA Logarithm of CNTLAD 

LOGSURVU Logarithm of SURVUNTR 

LOGCNTLU Logarithm of CVTLUNTR 
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Clinical trials of this sort are the basis for evaluating the effectiveness of any new drug 
or medical treatment. They are a critical part of the FDA approval process in the U.S. 


and similar evaluations 


in virtually all developed countries. 


Potential analyses include descriptive statistics, transformations, ANOVA and 


survival analysis. 


Box Plot of Selected Cancer Types 


The input is: 


USE CANCERDM 

SELECT (ORGAN$= 
(ORGANS = 
(ORGANS = 

THICK 3 

CATEGORY ORGANS 

BEGIN 

DEN LOGSURVA 


'Breast') OR (ORGAN$= 'Bronchus') OR, 
'Colon') OR (ORGAN$= 'Ovary') OR, 
'Stomach' ) 


*ORGANS / DOX, SIZE=1.2,FILL=1, FCOLOR=BLUE, 
COLOR=YELLOW, YLAB='Log Survival', 
XLAB: Organ',HEI=5IN,WID=5IN, 
TITLE='Survival by Cancer Type' 


PLOT LOGSURVA*ORGANS / SMOOTH=LOWESS , TENSION=0, SIZE=0 


END 
THICK 1 


The output is: 


COLOR=1, YLAB='',XLAB='',HEI=5IN, 
WID=5IN, TITLE='' 


Survival by Cancer Type 


o 


~ 


E) 


a 


a 
2 
2 
= 
ao 
D 
3 
A 
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Transformation of Survival Variable 
The input is: 
USE CANCERDM 


PPLOT SURVATD 


The output is: 


Normal( 0.0, 1.0) Quantile 


0 1000 2000 3000 4000 500C 
SURVATD 


To perform an ANOVA, the variable used must produce a straight line in a probability 
plot. Clearly the distribution of SURVATD is skewed and must be transformed. 


You can use the Graph Window to reduce the X-axis power from 1 through successive 
exponential power transformation 0.9 to 0.1 and finally to 0, which is same as the log 
transformation. 
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SURVATD 
The second plot should appear. Since the probability plot is much closer to a straight 
line we see that a log transformation is appropriate. 


Survival Rates of Melanoma Patients 


MELANMDM data contains reports on melanoma patients. 


Variable Description 

TIME The survival time for melanoma patients in days 
CENSOR The censoring variable 

WEIGHT The weight variable 

ULCER Presence or absence of ulcers 

DEPTH Depth of ulceration 

NODES Number of lymph nodes that are affected 

SEX$ The sex of the patient 


SEX The stratification variable coded for the analysis 


val studies are used in the area of drug development. Survival rates of the patients 
d to determine the effectiveness of the drug in 


d as a stratification variable to examine the 


Survi 
on an experimental drug are studie 
treating melanoma. Sex may be use: 
difference in the survival patterns of male and female patients. 


Potential analyses include survival analysis and logistic regression. 
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Stratified Cox Regression 


The input is: 


USE MELNMADM 


SURVIVAL 
MODEL TIME =ULCER, DEPTH, NODES / CENSOR=CENSOR STRATA=SEX 


ESTIMATE / COX 
LTAB / CHAZ 


The output is: 


Time Variable : TIME 
Censor Variable : CENSOR 


Input Records : 69 
Records Kept for Analysis : 69 


Censorin | Observations 


Exact Failures | 36 
Right Censored | 33 
Covariate Means 


ULCER | 1.507 
DEPTH | 2.562 
NODES | 3.246 

Type 1: Exact Failures and Right Censoring 
Overall Time Range: [72.000 , 7307.000] 
Failure Time Range: {72.000 , 1606.000) 
Stratification on SEX specified, 2 levels 
Cox Proportional Hazards Estimation 


With stratification on SEX 


Iteration Step Log-Likelihood 


Results after 4 Iterations 


: 0.000 
0.000 
32.533 with 3 df 
0.000 


Final Convergence Criterion 
Maximum Gradient Element 

Initial Score Test of Regression 
Significance Level (p-value) 


Final Log-Likelihood -103.533 
AIC 213.066 
Schwarz's BIC 217.816 


-2*(LL(0)-LL(4')] Test 18.063 with 3 df 


Significance Level (p-value) : 0.000 

Parameter | Estimate Standard Error Z p-value 
ULCER H -0.817 0.385 -2.123 0.034 
DEPTH H 0.083 0.053 1.587 0.112 
NODES i 0.131 0.057 2.289 0.022 


: 95.0% Confidence Interval 
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Parameter | Estimate Lower Upper 


Life Table for Last Cox Model 

All the Data will be Used 

The following results are for SEX = 0. 
Evaluated at Mean Values of Covariates: 


ULCER : 1.507 


DEPTH : 2.562 
NODES : 3.246 


No Tied Failure Times 


Model Survival 


Number at Risk Number Failing 


Probability 


Applications 


Model Hazard 
Rate 


+000 1579.000 


5.000 
-000 1606.000 


4.000 


31.000 1.000 133.000 
30.000 1.000 184.000 
29.000 1.000 251.000 
28.000 1.000 320.000 
27.000 1.000 391.000 
26.000 1.000 414.000 
25.000 1.000 434.000 
23.000 1.000 471.000 
22.000 1.000 544.000 
20.000 1.000 788.000 
19.000 1.000 812.000 
15.000 1.000 1151.000 
13.000 1.000 1239.000 

1 

a, 


The following results are for SEX = 1. 
Evaluated at Mean Values of Covariates: 


ULCER : 1,507 
DEPTH : 2.562 
NODES : 3.246 


No Tied Failure Times 


ooocoooooooocoooo 
3 EA 
o 
N 


Model Survival 


Number at Risk Number Failing Time 
38.000 1.000 72.000 
37.000 1.000 125.000 
36.000 1.000 127.000 
35.000 1.000 142.000 
34.000 1.000 151.000 
33.000 1.000 154.000 
32.000 1.000 176.000 
31.000 1.000 229.000 
30.000 1.000 256.000 
29.000 1.000 362.000 
28.000 1.000 422.000 
27.000 1.000 441.000 
26.000 1.000 465.000 
25.000 1.000 495.000 
23.000 1.000 584.000 
22.000 1.000 645.000 
21.000 1.000 659.000 
20.000 1.000 749.000 
18.000 1.000 803,000 
16.000 1.000 1020.000 
15,000 1.000 1042.000 


0,998 
0.973 
0.949 
0.923 
0,898 
0.873 
0.848 
0.823 
0.798 
0.772 
0.747 
0.720 
0.692 
0.663 
0.634 
0.603 
0.569 
0.536 
0.501 
0.464 
0.427 


ooocooocoocooocooooooo 
a soe © 
= 
© 


Model Hazard 
Rate 
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Cumulative Hazard Plot 


a E a eal 


Cumulative Hazard 
& R a 
T T T 
1 1 fi 


e 
T 
i 
sne 
l 


0.0 
0 


Log-Rank Test, Stratification on SEX, Strata Range 1 to 2 


H Chi-square 
| Statistic with 
i 
i 


Method 1 df p-value 
Mantel . 

Breslow-Gehan | 1.589 0.207 
Tarone-Ware 1 1.167 0.280 


Stratified Kaplan-Meier Estimation 


The input is: 
USE MELNMADM 
SURVIVAL 
MODEL TIME / CENSOR=CENSOR, STRATA=SEX 
ESTIMATE 
LTAB 
The output is: 
Time Variable : TIME 
Censor Variable : CENSOR 
Input Records : 69 
Records Kept for Analysis : 69 
Censoring | Observations 
E 5S cae ee 
Exact Failures | 36 
Right Censored } 33 


Type 1: Exact Failures and Right Censoring 


Overall Time Range: [72.000 , 7307.000] 
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Failure Time Range: [72.000 , 1606.000] 
Stratification on SEX specified, 2 levels 
Nonparametric Estimation 
Table of Kaplan-Meier Probabilities 
With stratification on SEX 
All the Data will be Used 
The following results are for SEX = 0. 
Number at Risk Number Failing Time K-M Probability Standard Error 
31.000 1.000 133,000 0.968 0.032 
30.000 1.000 184.000 0.935 0.044 
29.000 1.000 251.000 0.903 0.053 
28.000 1.000 320.000 0.871 0.060 
27.000 1.000 391.000 0.839 0.066 
26.000 1.000 414.000 0.806 0.071 
25.000 1.000 434.000 0.774 0.075 
23.000 1.000 471.000 0.741 0.079 
22.000 1.000 544.000 0.707 0.082 
20.000 1.000 788.000 0.672 0.085 
19.000 1.000 812.000 0.636 0.088 
15.000 1.000 1151.000 0.594 0.092 
13.000 1.000 1239.000 0.548 0.095 
5.000 1.000 1579.000 0.438 0.124 
4.000 1.000 1606.000 0.329 0.133 
95.0% Confidence Interval 
Lower Upper 
0.792 0.995 
0.766 0.983 
0.729 0.968 
0.692 0.950 
0.655 0.929 
0.619 0.908 
0.584 0.885 
0.547 0.861 
0.512 0.836 
0.475 0.808 
0.439 0.780 
0.394 0.747 
0.346 0.711 
0.199 0.657 
0.103 0.580 
Mean Survival Time 
Mean Survival 95.0% Confidence Interval 
Time Lower Upper 
2395.302 2199.374 2591.231 


Survival Quantiles 
95.0% Confidence Interval 


Probability Survival Time Lower Upper 
eny 0.250 “i . 1579.000 . 
0.500 1579.000 788.000 1606.000 
0.750 471.000 251.000 1151.000 
The following results are for SEX = 1. 
Number at Risk Number Failing Time K-M Probability Standard Error 


1.000 72.000 0.974 0.026 
1.000 125.000 0.947 0.036 
1.000 127.000 0.921 0.044 
1.000 142.000 0.895 0.050 
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15.000 


95.0% Confide: 
Lower 


Mean Survival 


Mean Survival 
Time 


3404.857 
Survival Quant. 


Probabilit: 


1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 


mce Interval 
Upper 


Time 


95.0% Confiden 
Lower 


3288.624 
iles 


Survival Time 


151.000 
154.000 
176.000 
229.000 
256.000 
362.000 
422.000 
441.000 
465.000 
495.000 
584.000 
645.000 
659.000 
749.000 
803.000 
1020.000 
1042.000 


ice Interval 
Upper 


3521.090 


0.868 
0.842 
0.816 
0.789 
0.763 
0.737 
0.711 
0.684 
0.658 
0.632 
0.604 
0.577 
0.549 
0.522 
0.493 
0.462 
0.431 


95.0% Confidence Interval 


Lower 


Upper 


803.000 
362.000 


465.000 
142.000 


1042.000 
584.000 
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Survivor Function 
z 


LowerLimit Upper Limi 


$ 
z 
i 


Log-Rank Test, Stratification on SEX, Strata Range 1 to 2 


Chi-square 
Statistic with 
1 df p-value 


i 
i 
i 
Method i 
oconeaeeses poco een nn nn nnn ane een aae 
Mantel i 0.568 0.451 
Breslow-Gehan | 1.589 0.207 
i 1.167 0.280 


Tarone-Ware 


Weibull Estimation 
The input is: 
USE MELNMADM 
SURVIVAL 
MODEL TIME = ULCER, DEPTH, NODES / CENSOR=CENSOR 
ESTIMATE / EWB 
QNTL 
The output is: 
Time Variable : TIME 
Censor Variable : CENSOR 
Input Records : 69 
Records Kept for Analysis : 69 
Censoring | Observations 
pe E LL RereShcbwee 
Exact Failures | 36 
H 33 


Right Censored 
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Covariate Means 


ULCER | 1.507 
DEPTH | 2.562 
NODES | 3.246 


Type 1: Exact Failures and Right Censoring 


Overall Time Range: (72.000 , 7307.000] 
Failure Time Range: [72.000 , 1606.000] 
Weibull Model B(1)--shape, B(2)--scale 
Extreme value parameterization 


Iteration Step Log-Likelihood Method 


o 
1 -306.508 N-R 
Results after 11 Iterations 


0,000 
0.000 
14.738 with 5 df 
0.012 


Final Convergence Criterion 
Maximum Gradient Element 

Initial Score Test of Regression 
Significance Level (p-value) 


Final Log-Likelihood 306.508 

AIC 623.016 

Schwarz's BIC 634.187 

Parameter | Estimate Standard Error Z p-value 
prame nm aa AEA E a 
B(1) H 1.202 0.161 7.470 0.000 
B(2) i 1.277 0.728 9.990 0.000 
ULCER H 0.776 0.431 1.800 0.072 
DEPTH H -0.154 0.057 -2.675 0.007 
NODES H -0.063 0.020 -3.162 0.002 


e 


.0/B(1): 0.832, EXP(B(2)): 1446.887 


| Mean Failure 
Time Variance 


Vector 
1595.592 3716876.337 
900.377 1183539 .495 


ZERO 
MEAN 


Coefficient of Variation: 1.208 


95.0% Confidence Interval 


Parameter | Estimate Lower Upper 
B(1) 0.886 1.517 
B(2) 5.849 8.705 
ULCER -0.069 1.622 
DEPTH -0.266 -0.041 
NODES -0.102 -0.024 
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Table of Estimated Quantiles for Last Accelerated Weibull Model 


Covariate Vector 
ULCER : 1.507 
DEPTH : 2.562 
NODES : 3.246 


Quantile Estimated Time 


0.999 0.637 
0.995 4.418 
0.990 10.193 
0.975 30.935 
0.950 72.263 
0.900 171.618 
0.750 573.787 
0.667 866.645 
0.500 1650.688 
0.333 2870.859 
0.250 3796.547 
0.100 6985.190 
0.050 9583.149 
0.025 12306.215 
0.010 16065.792 
0.005 19013.916 
0.001 26151.527 


Standard Error 
of Log Time 


-068 
.815 
+707 
«567 
-463 
-363 
.248 


oooooor 


95.0% Confidence Interval 
Upper 


11313.122 60452.137 


Log of 
Estimated Time 
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222 
-207 
.221 
+237 
~286 
pr | 
-343 
-372 
asi 
-428 


oooooooooo 


Quantile Plot 


Probability 


Psychology 


Day Care Effects on Child Development 


The DAYCREDM data consists of three measures of a child’s social competence: a 
measure for behavior at dinner, a measure for behavior in dealing with strangers, and 
a measure involving social problem solving in a cognitive test. In addition, there is a 
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categorical variable for the setting in which a child was raised, either by parents, by a 
babysitter, or in a day-care center. 


Variable Description 

SETTINGS Daycare setting in which child is raised 
SETTING Coded setting 

DINNER Behavioral measure of skill during dinner 
STRANGER Measure of skill in dealing with a stranger 
PROBLEM Social problem solving skill in a cognitive test 


An important issue in child development is whether the daycare setting in which a child 
is raised has a differential effect on social behavior. This data set offers three measures 
of social competence for children in three different daycare settings--some cared for 
during the day by parents, others by a babysitter, and the rest in a daycare center. The 
data set is a good candidate for MANOVA because it offers three ways of measuring for 
a single latent variable—social competence. One critical issue is whether the data 
satisfy the assumptions of MANOVA, especially regarding homogeneity of variance 


and covariance across settings. 
Potential analyses include ANOVA, MANOVA, regression, and factor analysis. 


MANOVA 


The input is: 


USE DAYCREDM 
MANOVA 
PLENGTH LONG 
CATEGORY SETTING 
DEPEND DINNER, STRANGER, PROBLEM 


ESTIMATE 


The output is: 
Dependent Variable Means 
DINNER STRANGER PROBLEM 


1288.188 714.250 54.083 


Estimates of Effects B = (X'X) *x'¥Y 


Factor | Level DINNER STRANGER PROBLEM 
aie wesw Pkg ee BE =e ee re aaa 
CONSTANT | 1308.795 690.589 51.733 
SETTING i 1 -166.479 -62.116 -2.207 
SETTING i 2 109.905 -126.189 -12.533 
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Standardized Estimates of Effects 


Factor Level DINNER STRANGER PROBLEM 
CONSTANT | 0.000 0.000 0.000 
SETTING {| 1 -0.278 -0.176 -0.069 
SETTING | 2 0.156 -0.304 -0.331 


Total Sum of Product Matrix 


DINNER 13624387 .313 
STRANGER | 2382747.750 4713117.000 
i 


PROBLEM 241634.250 218044. 000 39267.667 


Residual Sum of Product Matrix E'E = Y'Y-Y'XB 


| DINNER STRANGER PROBLEM 
eee Pee ee ee a sae Sree ea 
DINNER | 12936578.626 
STRANGER | 2099145.095 3833722.926 
PROBLEM | 230259.126  149554.411  33741.074 


Residual Covariance Matrix Sy x 


H DINNER STRANGER PROBLEM 
E e a T 

i 

i 

i 

i 


DINNER 287479.525 
STRANGER 46647.669 85193 .843 
PROBLEM 5116.869 3323.431 749.802 


Residual Correlation Matrix Ry, x 


| DINNER STRANGER PROBLEM 


DINNER | 1.000 
STRANGER | 0.298 1.000 
PROBLEM į 0.349 0.416 1.000 


Information Criteria 

AIC i 1878.445 
AIC (Corrected) | 1893.445 
Schwarz's BIC į} 1906.513 


SETTING El 
N of Cases : 19 


Least Squares Means 


DINNER STRANGER PROBLEM 


1142.316 628.474 49.526 
123.006 66.962 6.282 


LS Mean 


f 
i 
22222m + 
! 
Standard Error | 


SETTING : 2 
N of Cases : 10 


Least Squares Means 


DINNER STRANGER PROBLEM 


1418.700 564.400 39.200 
169.552 92.301 8.659 


LS Mean 
Standard Error 


ees, ie 


SETTING 238 
N of Cases : 19 
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DINNER 
LS Mean 1365.368 
Standard Error 123.006 


STRANGER PROBLEM 
878.895 66.474 
66.962 6.282 


Test for effect called: CONSTANT 


Null Hypothesis Contrast AB 


DINNER STRANGER 


690.589 51 


1308.795 


PROBLEM 


$133 


Inverse Contrast A(X'X)“ŤA' 


0.023 


Hypothesis Sum of Product Matrix H = B'A'(A(X'X) "ŤA" ) "7AB 


DINNER 
DINNER 75105991:386 
STRANGER 39629901 .926 
PROBLEM 2968749.169 


Error Sum of Product Matrix G = E'E 


i DINNER 

- + 
DINNER | 12936578.626 
STRANGER | 2099145.095 
PROBLEM į 230259.126 


Univariate F Tests 


Source Type III SS 


i 
i 
+ 
| 75105991.386 
| 12936578 .626 
STRANGER | 20910836.774 
Error | 3833722.926 
PROBLEM | 117347.118 
Error i 33741.074 


Wilks's Lambda 
Pillai Trace 
Hotelling-Lawley Trace 


Test of Residual Roots 


| Chi-square 


1 through 1 į 102.306 


Canonical Correlations 


0.948 


Dependent variable Canonical Co 
by Conditional (within Groups) 


' 
i 

STRANGER ; 0.523 
1o 


1 75105991.386 


45 287479.525 
1  20910836.774 
45 85193. 843 
1 117347 .118 
45 749.802 
ics 
Value F-ratio 


128.489 


0.900 128.489 
8.964 128.489 
af 


STRANGER PROBLEM 
20910836.774 

1566469.415 117347.118 

STRANGER PROBLEM 
3833722.926 

149554.411 33741,074 
df Mean Squares F-ratio p-value 


261.257 0.000 

245.450 0.000 

156.504 0.000 
df p-value 


3, 43 0.000 
3, 43 0.000 
3, 43 0.000 


efficients Standardized 
Standard Deviations 


Applications 
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Canonical Loadings (Correlations between Conditional 
Dependent Variables and Dependent Canonical Factors) 


DINNER i 0.805 
STRANGER | 0.780 
PROBLEM {| 0.623 


Test for effect called: SETTING 
Null Hypothesis Contrast AB 


DINNER STRANGER PROBLEM 


1 | -166.479 -62.116 -2.207 
2 į 109.905 -126.189 -12.533 


1 0.040 
2 | -0.028 0,056 


Hypothesis Sum of product Matrix H = B'A'(A(X'X)" 'a') “IAB 
i DINNER STRANGER PROBLEM 


687808. 686 


i 
STRANGER j 283602. 655 879394. 074 
i 
i 


PROBLEM 11375.1124 68489.589 5526.593 


Error sum of Product Matrix G = E'E 
DINNER STRANGER PROBLEM 


DINNER | 12936578. 626 
STRANGER | 2099145.095 3833722.926 
PROBLEM | 230259.126 149554.411 33741.074 


Univariate F Tests 


Source | Type IJI ss df Mean squares F-ratio 


687808. 686 2 343904 .343 1.196 


DINNER |} 

Error | 12936578.626 45 287479.525 

STRANGER | 879394.074 2 439697. 037 5.161 
Error | 3833722.926 45 85193. 843 

PROBLEM | 5526.593 2 2763.296 3.685 
Error | 33741.074 45 749.802 


Multivariate Test statistics 


Statistic 

Wilks's Lambda 

Pillai Trace 
Hotelling-Lawley Trace 


THETA s M 


0.232 2 0.000 20.50 


Test of Residual Roots 


1 through 2 | 
2 through 2 | 
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Canonical Correlations 
0.482 0.241 


Dependent Variable Canonical Coefficients Standardized 
by Conditional (within Groups) Standard Deviations 


2 


DINNER =0. .980 
STRANGER 0.723 0.288 
PROBLEM 0.554 -0.424 


Canonical Loadings (Correlations between Conditional 
Dependent Variables and Dependent Canonical Factors) 


i 
+ 

DINNER | 0.068 0.918 
| 
i 
H 
i 


STRANGER 0.852 0.404 
PROBLEM 0.736 0.037 
Scatterplot Matrix (SPLOM) 


The input is: 


USE DAYCREDM 

LABEL SETTING / 1='Parent', 2 ='Sitter', 3='Center' 

SPLOM DINNER STRANGER PROBLEM /GROUP=SETTING, DEN=NORM, ELL, 
DASH=1,7,10, COLOR=3,1,2, FILL, SYMBOL=1,4,8, OVERLAY, 
TITLE='Social Competence Measures Across Settings' 
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The output is: 


Social Competence Measures Across Settings 


DINNER 


YINNIQ 


STRANGER 


YJƏNVYLS 


PROBLEM 


yi 


ui 


“DINNER 


A scatterplot matrix can be used to check the assumptions of MANOVA, i.e., that the 
variance and covariances are homogeneous across settings. From the SPLOM, there 
does not seem to be any systematic violations of the assumptions, which might require 
a variable transformation. 


Analysis of Fear Symptoms of U.S. Soldiers using Item-Response Theory 


COMBATDM data contains reports of fear symptoms by selected U.S. soldiers after 
being withdrawn from World War II combat. There are nine symptoms that are 


included for analysis and the number of soldiers in each profile of symptom is reported. 


Variable 
COUNT 
POUNDING 
SINKING 
SHAKING 
NAUSEOUS 
STIFF 
FAINT 
VOMIT 
BOWELS 
URINE 


Description 

Number of soldiers in each profile of symptom 
Violent pounding of the heart 
Sinking feeling of the stomach 
Shaking or trembling all over 
Feeling sick at the stomach 

Cold sweat 

Feeling of weakness or feeling faint 
Vomiting 

Losing contro] of the bowels 
Urinating in the pants 
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Determining which withdrawal fear symptoms are common to the soldiers after a 
combat and the probability of each taking place is useful in preparing the soldiers for 
future encounters. 

Potential analyses include Test item analysis, factor analysis, multidimensional 
scaling, and cluster analysis. 


Classical Test Item Analysis 


The input is: 


USE COMBATDM 

TESTAT 
MODEL POUNDING.. URINE 
FREQUENCY COUNT 
IDVAR COUNT 
ESTIMATE/CLASSICAL 


The output is: 
Case frequencies determined by value of variable COUNT 
Data Below are Based on 93 Complete Cases for 9 Data Items 


Test Score Statistics 
| Total Average Odd Even 


4.538 0.504 


Mean } 4. 2 2 

Standard Deviation | 2.399 0.267 1.333 1.277 
Standard Error | 0.250 0.028 0.139 0.133 
Maximum | 9.000 1.000 5.000 4.000 
Minimum ! 1,000 0.111 0.000 0.000 
N of Cases i 93 93 93 93 


Internal Consistency Data 


Split-half Correlation : 0.690 
Spearman-Brown Coefficient : 0.816 
Guttman (Rulon) Coefficient 0.816 
Coefficient Alpha - All Items 0.787 
Coefficient Alpha - Odd Items 0.613 
Coefficient Alpha - Even Items : 0.661 


Approximate Standard Error of Measurement of Total Score for 15 z score Intervals 


Total Score N Standard Error 


-4.458 0 
-3.258 0 
-2.059 0 
-0.860 0 
0.340 10 
1:539 16 
2.739 6 
3.938 29 
Sei? 10 
6.337 8 
7.536 8 
8.735 6 


PORRPRERE 
x 5 
ry 
a 
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2.250 9.935 0 . 
2.750 11.134 0 . 
3.250 12.334 0 
Item Reliability Statistics 
Item 
Standard Reliability 
Item Label Mean Deviation Item Total R Index Excl Item R 


zA POUNDING 0.903 0.296 0.331 0.098 0.215 
2 SINKING 0.785 0.411 0.499 0.205 0.354 
3 SHAKING 0.559 0.496 0.678 0.336 0.539 
4 NAUSEOUS 0.613 0.487 0.721 0.351 0.599 
5 STIFF 0.538 0.499 0.693 0.346 0.559 
6 FAINT 0.452 0.498 0.715 0.356 0.588 
7 VOMIT 0.376 0.484 0.622 0.301 0.472 
8 BOWELS 0.215 0.411 0.625 0.257 0.502 
9 URINE 0.097 0.296 0.503 0.149 0.402 


Item Reliability Statistics (contd.. 


Excl Item Alpha 


Logistic Test Item Analys; 


The input is: 


USE COMBATDM 

TESTAT 
MODEL POUNDING.. URINE 
FREQUENCY COUNT 
IDVAR COUNT 
ESTIMATE/LOG1 


The output is: 


Case frequencies determined by value of variable COUNT 

93 Cases were processed, each containing 9 items 

6 Cases were deleted by editing for missing data or for zero or 
perfect total scores after item editing. 

0 Items were deleted by editing for missing data or for zero or 
perfect total scores after item editing. 

Data below are based on 87 Cases and 9 Items 


Total Score Mean £ 4.230 
Standard Deviation : 2.164 
-Log (Likelihood) Using Initial Parameter Estimates : 270.982 
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STEP 1 Convergence Criterion : 0.050 


Stage 1: Estimate Ability with Item Parameter(s) Constant 


-Log 
(Likelihood) Change LR 


270.071 -0.911 2.486 
Greatest Change in Ability Estimate was for Case 80 


Change from Old Estimate : 0.134 
Current Estimate : 2.005 


Stage 2: Estimate Item Parameter (s) with Ability Constant 


-Log 
(Likelihood) Change LR 


269.662 -0.409 1.505 


Greatest Change in Difficulty Estimate was for Item BOWELS 


Change from Old Estimate : 0,084 
Current Estimate : 1.301 


Current Value of Discrimination Index : 1.206 
STEP 2 Convergence Criterion : 0.050 


Stage 1: Estimate Ability with Item Parameter (s) Constant 


-Log 
(Likelihood) Change LR 


269.590 -0.072 1.075 


Greatest Change in Ability Estimate was for Case 87 
Change from Old Estimate : 0.006 
Current Estimate : 2.011 


Stage 2: Estimate Item Parameter (s) with Ability Constant 
-Log 
(Likelihood) Change LR 


269.549 -0.041 1,042 
est Change in Difficulty Estimate was for Item BOWELS 


Great 


Change from Old Estimate : 0.032 
Current Estimate + 1.315 


Current Value of Discrimination Index : 1.226 
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5 
i 
as 
p ; 
i l 4 
VMT URINE 
Sociology 


World Population Characteristics 


The WORLDDM data contains 1990 information on 30 countries and includes birth 
and death rates, life expectancies (male and female), types of government, whether 
mostly urban or rural, and latitude and longitude. 


Variable Description 

COUNTRY$ Country name 

BIRTH_RT Number of births per 1000 people in 1990 
DEATH_RT Number of deaths per 1000 people in 1990 
MALE Years of life expectancy for males 
FEMALE Years of life expectancy for females 
Govs Type of government 

URBANS Rural or city 

LAT Latitude of the country's centroid 


LON Longitude of the country's centroid 
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Countries are often classified into categories (for example, developed or third world) 
based on certain socioeconomic criteria (one key group of criteria being population 
statistics). This data set contains such criteria for 30 countries of various regions and 
per capita income levels, allowing countries to be clustered according to population 
characteristics. In addition, variables such as the type of government and whether the 
country is mostly rural or urban may have an impact on these population 
characteristics. 

Potential analyses include ANOVA, regression, cluster analysis, multidimensional 
scaling, and mapping. 


Cluster Analysis 


The input is: 


USE WORLDDM 
CLUSTER 
IDVAR COUNTRYS 
JOIN BIRTH_RT DEATH_RT 


The output is: 


Distance Metric is Euclidean Distance 
Single Linkage Method (Nearest Neighbor) 


Clusters Joining at Distance 
Sweden Finland 0.707 2 
UK Sweden 0.707 3 
Haiti Ethiopia 0.707 2 
Jamaica Chile 0.707 2 
France UK 1,000 4 
Italy Spain 1.000 2 
Haiti Sudan 1.000 3 
Ecuador Turkey 1.000 2 
France Germany 1.414 5 
Canada France 1,414 6 
Algeria Libya 1.414 2 
Somalia Haiti 1.414 4 
Trinidad CostaRica 1.414 2 
Italy Canada 1.581 8 
Hungary Italy 1.581 9 
Barbados Argentina 1.581 2 
Brazil Trinidad 1.581 3 
Ecuador Brazil 1.581 5 
Somalia Gambia 2.236 5 
Jamaica Barbados 2.236 4 
Jamaica Hungary 2.915 13 
Mali Guinea 2.915 2 
Somalia Mali 2.915 7 
Yemen Somalia 2.915 8 
Algeria Bolivia 3.162 3 
Jamaica Ecuador 3.606 18 
Jamaica Algeria Pir si 
Ira . 
rarr 5 6.083 30 


Jamaica Yemen 
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Clustering Countries by Birth and Death Rates. 


Cluster Tree 


sofdiedtstielacdldslltad dead 


1 2 & 4 S € 
Distances 


Kernel Densities Ellipses and Modal Smoothers 


The input is: 


USE WORLDDM 
BEGIN 
PLOT DEATH_RT*BIRTH_RT / XMIN=0, XMAX=60, YMIN=0, YMAX=30, 
XTICK=6, SYMBOL=1, SIZE=.5, 
LABEL=COUNTRY$, SMOO=MODE, 
XLAB="Births per 1000 People (1990)", 
YLAB="Deaths per 1000 People (1990)" 
DEN .*DEATH_RT*BIRTH_RT / XMIN=0, XMAX=60, YMIN=0, YMAX=30, 
XTICK=6, KERNEL, CONTOUR, ZTICK=10, ZPIP=0, 
AX=0, SC=0, 
TITLE="Birth and Death Rates for 30 Countries" 
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The output is: 


Birth and Death Rates for 30 Countries 


8 


ris tants SRLS yy Ae 


8 


Deaths per 1000 People (1990) 
3 


she ee alt 


o 


40 6c 


0 0 2 D 50 
Births per 1000 People (1990) 


Statistics 


Instructional Methods 


The INSTRDM data consists of measures of achievement on a biology exam for two 
groups of students—one group simply told to study everything from a biology text in 
general and the other given terms and concepts that they were expected to master. An 
additional covariate, the student’s aptitude, is also included in the data set. 


Variable Description 

STUDENT Student ID 

INSTRUCT$ Type of instruction given 
INSTRUCT Coded variable for INSTRUCTS 
APTITUDE Student’s underlying ability to learn 
ACHEIVE Student's score on the exam 


-theory standpoint, this data set is interesting because it 
> due to different study instructions. A student 
n given specific instructions on what 


From an education: 
demonstrates the effect on “achievement? 
is likely to show a higher level of achievement whe 
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to know for an exam than a student who gets only general instructions. From a 
statistical standpoint, it demonstrates the importance of considering covariates when 
using ANOVA models. A straight ANOVA of ACHIEVE on INSTRUCT shows no 
significance at the 95% confidence level, but after separating out some of the variance 
using the covariate APTITUDE in an ANCOVA model, there is a significant difference 
between instruction groups. 

Potential analyses include ANOVA, ANCOVA, and regression. 


Analysis of Covariance 


The input is: 


USE INSTRDM 
GLM 
CATEGORY INSTRUCT / EFFECT 
MODEL ACHIEVE = CONSTANT + INSTRUCT + APTITUDE 


ESTIMATE 

The output is: 
Dependent Variable | ACHIEVE 
N i 20 
Multiple R | 0.760 
Squared Multiple R | 0.578 


Estimates of Effects B = (X'X)Xx'y 


Factor | Level ACHIEVE 

inann SEE raea De 

CONSTANT į 9.646 

INSTRUCT | 1 =5.255 

APTITUDE ! 0.502 

Analysis of Variance 

Source | Type III SS df Mean Squares F-ratio p-value 
INSTRUCT į 641.424 1 641.424 10.915 “0.004 
APTITUDE | 961.017 1. 961.017 16.354 0.001 
Error i 998.983 17 58.764 
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Least Squares Means 


“hi Spe — 
bal 2 
w 
ï 
= 4.0 a | 
Q 
< 
2.5- 4 
21.0 A ee J 
INSTRUCT 
Durbin-Watson D Statistic 2.197 


First Order Autocorrelation -0.171 


Scatterplot 


The input is: 


USE INSTRDM 
PLOT ACHIEVE * APTITUDE / GROUP=INSTRUCT$, OVERLAY, 


BORDER=NORMAL, ELL, SMOOTH=LINEAR, FCOLOR=GRAY, SYMBOL=1, 8, 


FILL, 


TITLE="Effect of Instructional Methods on Exam Achievement" 
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The output is: i 
Effect of Instructional Methods on Exam Achievement 
INSTRUCTS 
@ GENERA. 
E SPECIRC 
Toxicology 


Concentration of nicotine sulfate required to kill 50% of a group of common 
fruit flies 
The WILLMSDM data contains the results of a bioassay conducted to determine the 


concentration of nicotine sulfate required to kill 50% of a group of common fruit flies. 
The experimenters recorded the number of fruit flies that are killed at different dosage 


levels. 

Variable Description 

RESPONSE The dependent variable, which is the response of the 
fruit fly to the dose of nicotine sulfate (stimulus). 

LDOSE The logarithm of the dose. 

COUNT The number of fruit flies with that response. 


In bioassay, it is common to estimate the dose required to kill 50% of a target 
population. For example, a toxicity experiment may be conducted to establish the 
concentration of nicotine sulfate required to kill 50% of a group of common fruit flies. 
The goal is to identify the level of stimulus required to induce a 50% response rate, 
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where response may be any binary outcome variable and the stimulus is a continuous 
variate. In bioassay, stimuli include drugs, toxins, hormones, and insecticides; 
responses include death, weight gain, bacterial growth, and color change. 

Potential analyses include logistic regression and survival analysis. 


Logistic regression 


The input is: 


USE WILLMSDM 

FREQ=COUNT 

LOGIT 
MODEL RESPONSE=CONSTANT+LDOSE 
ESTIMATE 
QNTL 

LET LDOSEB=LDOSE- . 4895 
MODEL RESPONSE=LDOSEB 
ESTIMATE 

LET LDOSEB=LDOSE+2.634 
MODEL RESPONSE=LDOSEB 
ESTIMATE 


The output is: 


Case frequencies determin 
Categorical values encoun! 


ed by value of variable COUNT 
tered during processing are 


variables 


RESPONSE (2 levels) 


ee ae 


Binary LOGIT Analysis 


: RESPONSE 
Analysis is Weighted by =: COUNT 


Dependent Variable 


: 25.000 


Sum of Weights A 


Input Records 


Records for Analysis AE 

Sample Split 

Cate: i 

mie ai + 

0 (REFERENCE) | 4 15 
1 (RESPONSE) | 5 10 
Total i 9 25.000 


Log-Likelihood Iteration History 


Log-Likelihood at Iteration1 į -17.329 
Log-Likelihood at Iteration2 | -13.277 
Log-Likelihood at Iteration3 | -13.114 
Log-Likelihood at Iteration4 | -13.112 
Log-Likelihood at Iteration5 | -13.112 

13.112 


Log-Likelihood 
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Information Criteria 


AIC i 
Schwarz's BIC | 30.618 


95 % Confidence Interval 


Parameter i Estimate Standard Error z p-value Lower Upper 
1 CONSTANT | 0.564 0.496 1.138 0.255 -0.408 1.536 
2 LDOSE i 0.919 0.394 2.334 0.020 0.147 1.691 


Odds Ratio Estimates 


i 95 % Confidence Interval 
Parameter | Odds Ratio Standard Error Lower Upper 


2.507 
Log-Likelihood of Constants only Model = LL(0) : -16.825 
2* [LL (N)-LL(0) ] 2 7.427 
df pL 
p-value : 0.006 
McFadden's Rho-squared {| 0.221 
Cox and Snell R-square | 0.562 
Naglekerke's R-square | 0.576 
Evaluation Vector 
1 CONSTANT | 1.000 
2 LDOSE | VALUE 
Quantile Table 
Probability LOGIT LDOSE Upper Lower 
0.999 6.907 6.900 44.788 3.518 
0.995 5.293 5.145 33.873 2.536 
0.990 4.595 4.385 29.157 2.105 
0.975 3.664 3.372 22.875 1.519 
0.950 2.944 2.590 18.042 1.050 
0.900 2.197 1.777 13.053 0.530 
0.750 1.099 0.582 5.928 -0.445 
0.667 0.695 0.142 3.551 ~1.047 
0.500 0.000 -0.613 0.746 -3.364 
0.333 -0.695 -1.369 -0.347 ~7.392 
0.250 -1.099 -1.809 -0.731 -9.987 


0.001 -6.907 -8.127 -4.486 -49.055 
Case frequencies determined by value of variable COUNT 
Categorical yalues encountered during processing are 


Variables 


Levels 


RESPONSE (2 levels) 0.000 1.000 
Binary LOGIT Analysis 

Dependent variable : RESPONSE 

Analysis is Weighted by : COUNT 

Sum of Weights : 25.000 

Input Records : 9 


Records for Analysis :9 


Sample Split 

Category | Count Weighted Count 
pea APIAN ff PR ae Ne 
0 (REFERENCE) | 4 15 
1 (RESPONSE) | 5 10 
Total i 9 25.000 


Log-Likelihood Iteration History 


Log-Likelihood at Iteration] | -17.329 
Log-Likelihood at Iteration2 | -15.060 
Log-Likelihood at Iteration3 | -15.032 


i 

i 

i 
Log-Likelihood at Iteration4 | -15.032 
Log-Likelihood at Iteration5 | -15.032 
Log-Likelihood | -15.032 
Information Criteria 


AIC ! 
Schwarz's BIC | 32.261 


Parameter Estimates 


Parameter | Estimate Standard Error 


95 $ Confidence Interval 
Lower 
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95 è Confidence Interval 
Lower Upper 


Upper 


Case frequencies determined by value of variable COUNT 
Categorical values encountered during processing are 


Variables I 


RESPONSE (2 levels) | 0.00 


Binary LOGIT Analysis 


Dependent Variable : RESPONSE 
Analysis is Weighted by =: COUNT 
Sum of Weights + 25,000 
Input Records : S 


Records for Analysis 


Sample Split 

Category | Count Weighted Count 
pa TRA e plea sencatek-nascsemsrns 
0 (REFERENCE) | 4 15 
1 (RESPONSE) | 5 10 
Total i 9 25.000 
Log-Likelihood Iteration History 
Log-Likelihood at Iterationl | -17.329 
Log-Likelihood at Iteration2 | -15.055 
Log-Likelihood at Iteration3 | -15.032 
Log-Likelihood at Iteration4 | -15.032 
Log-Likelihood at Iteration5 H Se 

| -15. 


Log-Likelihood 
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Information Criteria 


AIC | 32.064 
Schwarz's BIC | 32.262 


Parameter Estimates 


Parameter Standard Error zZ p-value Lower 


1 LDOSEB | 0.312 0.159 1.968 0.049 0.001 


95 % Confidence Interval 
Odds Ratio Standard Error Lower Upper 


Plot of Logistic Model 


The input is: 


USE WILLMSDM 
FREQ=COUNT 
LOGIT 
MODEL RESPONSE=CONSTANT+LDOSE 
ESTIMATE 
SAVE QUANT 
QNTL 


95 % Confidence Interval 


Upper 


REM CREATES PLOT OF LOGISTIC MODEL WITH LIMIT LINES ADDED AT THE 


REM UPPER 


REM AND LOWER LIMITS FOR THE LDOSE VALUE CORRESPONDING TO A 


REM PROBABILITY HAS .50 
USE QUANT 
BEGIN 


PLOT PROB*LDOSE / SIZE=0 XLAB=" " YLAB=" " XLIMIT=-3.364, 746, 
XMIN=-5 XMAX=5 XTICK=4, ACOLOR=RED YTICK=4, YMAX=1 YMIN=0 


PLOT PROB*LDOSE / SIZE=0 SMOOTH=SPLINE TENSION =0.500, 


XMIN=-5 XMAX=5 XTICK=4 XLAB="LDOSE", YLAB="Probability", 


YLIMIT=0.5 YTICK=4 YMAX=1, YMIN=0 
USE WILLMSDM 
LET PDEAD=COUNT/5 
SELECT (RESPONSE=1) 


PLOT PDEAD*LDOSE/SYM=2 YTICK=4 YMAX=1 YMIN=0 XMIN=-5, XMAX=5, 


XTICK=4, XLAB=" " YLAB= 
END 


" SCALES=NONE, TITLE="Logistic Model" 
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The output is: 


Logistic Model 


Probability 


Data References 


Anthropology Data Sources 


Original Source. Thomson, A. and Randall-Mclver, R. (1905). Ancient races of the 
Thebaid. Oxford: Oxford University Press. 

Data Reference. Hand, D. J., Daly, F., Lunn, A.D., McConway, K.J., and Ostrowski, E. 
(1994). A handbook of small data sets. New York: Chapman & Hall. pp. 299-301. 

Manly, B.F.J. (1986). Multivariate statistical methods. New York: Chapman & Hall. 

STATLIB. hitp://lib.stat.cmu.edu/DASL/Datafiles/EgyptianSkulls.himl 


Astronomy Data Source 


Original Source. Waldmeir, M. (1961). The sunspot activity in the years 1610-1960. 
Zurich: Schulthess and International Astronomical Union Quarterly Bulletin on Solar 
Activity. Tokyo. 

Data Reference. Andrews, D.F. and Herzberg, A.M. (1985). Data, pp. 67-76. Springer- 
Verlag. 
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Biology Data Source 


Data Source. Carey, J.R., Liedo, P., Orozco, D., and Vaupel, J.W. (1992). Slowing of 
mortality rates at older ages in large med fly cohorts. Science, pp. 258, 457-461. 

Data Reference. STATLIB http://lib.Stat.cmu.edu/DASL/Datafiles/Medflies.html 

Data Source. Allison, T. and Cicchetti, D. V. (1976). Sleep in mammals: Ecological and 
constitutional correlates. Science, pp. 194, 732-734. 


Chemistry Data Sources 


Original Source. Adapted from a conference session on statistical computing (Greco et al., 
1982). 

Data Reference. Wilkinson L. and Engelman, L. (1996). SYSTAT 6.0 for Windows: 
Statistics, pp. 487-488, SPSS Inc. 


Engineering Reference 


Devor, R.E., Chang, T. and Sutherland, J.W. (1992). Statistical quality design and control, 
pp. 756-761. New York: MacMillan. 


Environmental Science Sources 


Original Source. Lange, Royals, and Connor. (1993). Transactions of the American 
fisheries society. 
Data Reference. STATLIB http://lib.Stat.cmu.edu/DASL/Datafiles/MercuryinBass.html 


Genetics Data Sources 


Data Source. Rao, C. R. (1973). Linear Statistical Inference and its Applications. 2nd 
edition, New York: John Wiley & Sons. 

McLachlan, G.J. and Krishnan. T. (1997). The EM algorithm and extensions. New York: 
John Wiley & Sons. 
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Manufacturing Data Sources 


Original Source. Messina, W.S. (1987). Statistical quality control for manufacturing 
managers. New York: Wiley. 

Data Reference. Stenson, H. and Wilkinson, L. (1996). SYSTAT 6.0 for Windows: 
Graphics, SPSS, pp.291-369. 


Medicine Data Sources 
Original Source. Cameron, E. and Pauling, L. (1978). Supplemental ascorbate in the 
supportive treatment of cancer: Reevaluation of prolongation of survival times in 
terminal human cancer. Proc. Natl. Acad. Sci. U.S.A, 15, 4538-4542. 


Data Reference. Andrews, D.F. and Herzberg, A.M. (1985). Data, pp. 203-207. Springer- 
Verlag. 


Medical Research Data Reference 


Wilkinson L. and Engelman, L. (1996), SYSTAT 7.0: New Statistics, pp.235, SPSS Inc. 


Psychology Data Reference 
Wilkinson, L., Blank, G. and Gruber, C. (1996). Desktop data analysis with SYSTAT. 
Upper Saddle River, NJ: Prentice Hall, p.454. 


Stroufer, S.A., Guttmann, L., Suchman, E.A., Lazarsfeld, P.F., Staf, S.A., and Clausen, J. 
A. (1950). Measurement and prediction. Princeton, N. J.: Princeton University Press. 


Sociology Data Reference 


Wilkinson, L., Blank, G. and Gruber, C. (1996). Desktop data analysis with SYSTAT. 
Upper Saddle River, NJ: Prentice Hall, p.738. 


Statistics Data Sources 


Original Source. Huitema, B.E. (1980). The Analysis of covariance and alternatives. New 
York: John Wiley & Sons. 
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Data Reference. Wilkinson, L., Blank, G., and Gruber, C. (1996). Desktop data analysis 
with SYSTAT. Upper Saddle River, NJ: Prentice Hall, p. 442. 


Toxicology Data Source 


Hubert J. J. (1991). Bioassay. 3rd ed. Dubuque, Iowa: Kendall Hunt. 


Appendix 
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Data Files 


SYSTAT software comes with a folder of data files, which can be accessed through 
the File=>Open=>Data dialog. The folder contains over 350 files of data used in the 
nearly 600 examples provided in the user manual and online help. This Appendix 
gives details of these files, with sources of data, a brief description of the study which 
generated the data, and a description of the variables in the file. 


These data files not only contain the data, but also a great deal of information on the 
data file. The information given in this Appendix is available in the data file itself. 
When you have clicked on the data file name in the dialog and opened it in the Data 
Editor, by hovering the mouse over the corner rectangle (the top left cell) you will see 
the general information on the file. Then in the Variable Properties dialog of a 
variable (which can be opened by Data=>Variable Properties with the variable name 
selected by clicking on it or by simply right-clicking on the variable name in the data 
file), in the Comments box at the bottom, you will see information on the variable. 
This information on the variable is also seen as a tooltip by simply moving the mouse 


over the variable name. 


For a data file you create, you may construct this general file information by filling it 
in the File Comments dialog, which can be opened by right-clicking on the file name 
in the Data Editor, or on the top left cell. Information on individual variables may be 
entered in the Comments box of the Variable Properties dialog. 


The data file contains even more information, which can be seen by clicking the 


Variable tab in the Data Editor, which opens the Variable Editor. This contains 
information on each variable as to its name, label, value labels, type (string or 
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numeric), categorical or not, the number of characters, number of decimals, display 
type and comments. It also contains information on which variables are involved in 
case selection, has been chosen to be a frequency or a weight variable, for BY groups 
analysis, a category variable or an order variable. 


The following data files are ‘Read only’: 


ACCIDENT" Jobson (1996). The data set relates to automobile accidents in Alberta, Canada. The 


variables are — SEATBELT$, IMPACTS$, INJURY$, DRIVERS, FREQ. 


ADAPTOR: The ‘adaptor body’ is one of the components of a machine. Its outer diameter is 


denoted by DIA. The data set contains the DIA of 16 adaptor bodies produced over a period 
of 16 hours one in each hour. The total time period is divided into two periods of eight hours 
each and the variable ‘E/GHT’ takes value 1 or 2 depending upon the period of its production. 
Similarly variables ‘FOUR’ and ‘TWO?’ are constructed. Thus the ‘design’ is anested one with 
‘four’ nested inside ‘EIGHT’ and ‘TWO’ nested inside ‘FOUR’. The variables are - DIA, 
EIGHT, FOUR, TWO. 


ADJADAPTOR: The data set consists of the outer diameter of a component named adaptor body, 


before and after correction. The two variables are — BEF ORE, AFTER. 


ADMIRE" Cohen and Brook (1987). In a large-scale longitudinal study of childhood and 


adolescent mental health, data were obtained on personal qualities that the subjects admired 
and what they thought other children admired, as well as the sex and age of the subjects. The 
admired qualities were organized into scales for antisocial, materialistic, and conventional 
values for the self and as ascribed to others. In one phase of the investigation, the researchers 
wanted to study the relationship between the sets of self versus others. However, several of 
these scales exhibited sex differences, were nonlinearly (specifically quadratically) related to 
age, and/or were differently related to age for the sexes. For the self-other association to be 
assessed free of the confounding influence of age, sex, and their interactions, it was desirable 
to partial those effects from the association. Using SYSTAT, the variables SEX times AGE 
and their squares were created. The variables are — ID$, ANTISO_S, MA TER_S, CONVEN_S, 
ANTISO_O, MATER_O, CONVEN_O, AGE, SEX, AGESQ, SEXAGE, SEXAGESQ. 


ADMIT» Graduate Record Examination Verbal (GREV) and Quantitative (GREQ) scores with a 


binary indicator of whether or not a student was awarded a Ph.D. (PHD$) in a graduate 
psychology department. The variables are — YEAR, GPA, GREV, GREQ, GRE, PHD, 
GROUP, N, PHD$. 
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AEROSOL“ Beckman, Nachtsheim and Cook (1987). This is a study of high efficiency particulate 
air HEPA cartridges. For this two aerosol types (AEROSOL) were used to test the three HEPA 
respirator filters (FILTER) from each of two different manufacturers (MANUFACTURER). 


AFIFIe Afifi and Azen (1974). The dependent variable, SYS/NCR, is the increase in systolic blood 
pressure after administering one of four different drugs (DRUG) to patients with one of three 
different diseases (DISEASE). Patients were assigned randomly to one of the four possible 
drugs. 

AGElI- The data set consists of two variables AGE$ and SEX$. 

AGESEX¢ U.S. Census (1980). These data show the distribution of MALES and FEMALES within 
age groups. The variable AGE labels each age group by the upper age limit of its members. 

AGESTAT> The data set is randomly generated data consisting of two variables AGE and SEX$. 

AGRI and AGR2e The data sets consist of a hypothetical agricultural data, where the yields of 
crops are related to the soil type and the type of fertilizer used. The variables are - YIELD, 
FERTILIZER and SOIL. 

AIAG» Breyfogle (2003). This data set originated from Automotive Industry Action Group 
(AIAG)(1995). The data set deals with measures of a critical quality characteristic 
(MEASURE) of 80 samples. 5 samples collected in each of 16 subgroups (SUBGROUP). 

AIRCRAFT? Bennett and Desmarais (1975). These data show amplitude of vibration (FLUTTER) 
versus time (TIME) in an aircraft wing component. 

AIRLINE" Box et al. (1994). The variable PASS contains monthly totals of international airline 
passengers for 12 years beginning in January, 1949. 

AKIMA” Akima (1978). These data are topological measurements of a 
three-dimensional surface using the variables X, Y, and Z. 

AMe Borg and Lingoes (1987), adapted from Green and Carmone (1970). This unfolding data set 
contains similarities only between the points delineating ‘A’ and ‘M,’ and these similarities 
are treated only as rank orders. Variables include A/ through A76 and ROWS. 

ANNEAL" Brownlee (1960). The experiment seeks to compare two different annealing methods 
for making cans. Three coils (COIL) of material were selected from the populations of coils 
made by each of the two methods (METHOD). Pair of samples was drawn from each of two 
locations (LOCATION) on the coil. The response is the life (LIFE) of the can. 

ANSFIELD> Ansfield et al. (1977). This study examines the effects (RESPONSES) of treatments 
(TREATS) on two patient groups (CANCER$), those with cancer of the colon or rectum and 
those with breast cancer. NUMBER gives the number of patients in each 
cancer/treatment/response group. 
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ANXIETY? Data are from a National Longitudinal Survey of Young Men conducted in 1979. The 
data set has been extracted from data set NLS. 


BANK¢ The data set consists of the description of bank employees. The variables are — 


WEIGHT 
ID Employee code 
SALBEG Beginning salary 

Sex of employee 
SEX 0 Male 

1 Female 
TIME Job seniority( in months) 
AGE Age of employee(in years) 
SALNOW Current salary 
EDLEVEL Educational level 
WORK Work experience 

Employment category 

1 Clerical 

2 Office trainee 

3 Security officer 
peear 4 College trainee 

5 Exempt employee 

6 MBA trainee 

7 Technical 


Minority classification 
MINORITY 0 White 

1 Nonwhite 

Sex & race classification 

1 Black Females 
SEXRACE 2 White Females 

3 Black Males 

4 White Males 


BARLEY? Fisher (1935). The data are the yields of 10 varieties of barley in two years (1931 and 
1932) at 6 sites in the Midwestern US. The variables are — Y/931, Y1932, VARIETY$, SITE$. 


BBDs Myers & Montgomery (2002). This data set contains observations on viscosity 
(VISCOSITY) at different level combinations of the three factors:temperature (TEMP), 
agitation (AGITATION) and rate of addition (RATE). Each factor has 3 levels. 


BIRTHS” Walser (1969). The data set consists of information on the FREQUENCY of births in 
each MONTH (labeled as 1,2,...,12) of a year in the University Hospital of Basel, Switzerland. 


355 
Data Files 


BIRTHS2e Conover (1999). These data were collected in a survey conducted in 7 hospitals of a 
certain city over a 12-month period divided into 4 seasons (SEASON$), and the numbers of 
newborn babies (BIRTHS) in each season were obtained. The variables are — BIRTHS, 
SEASON$, HOSPITAL$. 


BIT5° The file contains five-item binary profiles fitting a two-dimensional structure perfectly. 
Variables in the SYSTAT data file are: X(J)......X(5). 


BLOCK” Neter et al. (1996). These data comprise a randomized block design. Five blocks of 
judges (BLOCK) analyzed three treatments (TREAT). Subjects (judges) are stratified within 
blocks, so the interaction of blocks and treatments cannot be analyzed, and the outcome of the 
analysis is JUDGMENT. 

BLOCKCCDe Myers & Montgomery (2002). This data set contains observations on the yield of 
a chemical process (YIELD) at different level combinations of two factors, viz. time (TIME) 
and temperatute (TEMP) on 14 experimental units. However two different batches of raw 
materials were used. The variable BLOCK defines the different batches. 


BOARDS” Montgomery (2001). It is an aggregated data set on the number of nonconformities 
found in 26 successive samples of 100 circuit boards. For convenience, the sample unit (or 
inspection unit) is defined as 100 boards. That is, although each sample contains 100 boards, 
each sample is considered a sample of size 1 from a Poisson distribution. The variables are- 


SAMPLE Identifier 
DEFECTS A total count of the number of defects in each group of 100 Boards 


BOD> Bates and Watts (1988). Marske created these data from stream samples in 1967. Each 
sample bottle is inoculated with a mixed culture of microorganisms, sealed, incubated, and 
opened periodically for analysis of dissolved oxygen concentration. The variables are DAYS 
and BOD. 


BOOKPREF- Conover (1999). The data set consists of the number of books sold in a week in 12 
bookstores of four booksellers .The variables are - BOOKS, STORE, BOOKSELLER. 


BOSTON: Belsley, Kuh, and Welsch (1980). The data set is Boston housing prices, used in 
Breiman et al. (1984). The variables are - CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, 
RAD, TAX, PTRATIO, B, LSTAT, MEDV. 


BOXES> Messina (1987). The ohms of electrical resistance in computer boxes are measured for 
five randomly selected boxes from each of 20 days of production. Thus, each SAMPLE 
contains five observations of resistance in OHMS for each of 20 days (DAY). 
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BP» Hand et al. (1996). The data set gives the supine systolic and diastolic blood pressures (mm 
Hg) for 15 patients with moderate essential hypertension, immediately before and two hours 
after administering the drug, captopril. The variables are- 


Systolic blood pressure (mm Hg) with moderate essential hypertension before 
administering the drug, captopril 

Systolic blood pressure (mm Hg) with moderate essential hypertension 2 
SYSBP_AFTER hours after administering the drug, captopril 


Diastolic blood pressure (mm Hg) with moderate essential hypertension 
DIABP_BERORE: before administering the drug, captopril 


Diastolic blood pressure (mm Hg) with moderate essential hypertension 2 
DARP -ARTER hours after administering the drug, captopril 


SYSBP_BEFORE 


BRODLIE” Brodlie (1980). These data are X and Y coordinates taken from a figure in Brodlie’s 
discussion of cubic spline interpolation. 

BULB” Mendenhall et al. (2002). A manufacturer of industrial light bulbs tries to control the 
variability in length of life of the light bulbs so that standard deviation is less than 150 hours. 
The data consists of LIFETIME of 20 bulbs. 


BUSES” Davis (1977). These data count the number of buses failing (COUNT) after driving 1 of 
10 distances (DISTANCE). 

CANCER? Morrison (1990); Bishop et al. (1975). These studies examined breast cancer patients 
in three diagnostic centers (CENTER$), three age groups (AGE), whether they survived after 
three years post-diagnosis (SURVIVE$), and the inflammation type (minimum/maximum) and 
appearance of the tumor (TUMOR$) (malignant/benign). The variable NUMBER contains the 
number of women in each cell. 

CANCERDM¢ Cameron and Pauling (1978). The data set contains information from a study of 
the effects of supplemental vitamin C as part of routine cancer treatment for 100 patients and 
1000 controls (10 controls for each patient). 


CASE Case ID 

ORGANS Organ affected by cancer 

SEX$ Sex of patient 

AGE Age of patient 

SURVATD Survival of patient measured from first hospital attendance 
CNTLATD Survival of control group from first hospital attendance 
SURVUNTR Survival of patient from time cancer deemed untreatable 
CNTLUNTR Survival of control from time cancer deemed untreatable 


LOGSURVA Logarithm of SURVATD 
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LOGCNTLA Logarithm of CNTLATD 
LOGSURVU Logarithm of SURVUNTR 
LOGCNTLU Logarithm of CVTLUNTR 


CARDOG: Wilkinson (1975). This data set contains the INDSCAL configurations of the scalings 
of cars and dogs. The variables are - CAR$, DOG$, C1 ,C2, D1, D2. 


CARS” The data set reflects the attributes of the selected performance cars. The variables are — 
ACCEL, BRAKE, SLALOM, MPG, SPEED, NAME3. 


CEMENT" Birkes and Dodge (1993). The data set consists of four kinds of ingredients 
INGREDIENTI, INGREDIENT2, INGREDIENT3, INGREDIENT4 corresponding to the 
temperature (HEAT). 


CHOICE” McFadden (1979). The data set consists of hypothetical data .The CHOICE variable 
represents the three transportation alternatives (AUTO, POOL, TRAIN) each subject prefers. 
The first subscripted variable in each CHOICE category represents TIME and the second, 
COST. Finally, SEX$ represents the gender of the chooser. AGE represents the age of the 
chooser, 


CHOLESTEROL» The data set records the age and blood cholesterol levels for two groups of 
women. Women in the first group use contraceptive pills; women in the second group do not. 
A PILL value of 1 indicates that the woman takes the pill; a value of 2 indicates that she does 
not. Each case has the cholesterol value CHOL for a pill user and for her age-matched control 
AGE. 


CITIES¢ Hartigan (1975). The data set is a dissimilarity matrix consisting of airline distances in 
hundreds of miles between ten global cities: BERLIN, BOMBAY, CAPETOWN, CHICAGO, 
LONDON, MONTREAL, NEW YORK, PARIS, SANFRAN, and SEATTLE. 


CITYTEMP- These data consist of low and high July temperatures for eight U.S. cities in 1992. 


CLINCOV: Hocking (2003). This example is based on a clinical data set where a pharmaceutical 
firm wants to test a new drug for a particular disease. The response is a measure of the 
improvement in the patients’ status. A sample consisting of three clinics (CLINIC) is selected 
at random from a large population of clinics. From each clinic a sample of ten patients with 
the particular disease are selected. The drug is applied to each patient and we record the 
response (Y) of the drug as well as a relevant physical characteristic (Z) for each patient. 


CLOTH” Montgomery (2001). Here, the occurrences of nonconformities (DEFECTS) in each of 
10 rolls of dyed cloth were counted (ROLL). The rolls were not all the same size in square 
meters. Thus, the sample unit was defined as 50 square meters of cloth, and roll sizes were 
expressed in these units (UNITS). 
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COBDOUG: Judge et al. (1988). The data set is related to the Cobb-Douglas production function 
in Econometrics. The Cobb-Douglas Production function considers the effect of Labor (L) and 
Capital invested (K) over the output (Q). The data set consists of 20 observations containing 
the variables Y, X/ and X2, where we have Y=/nQ and X/=InL and X2=InK. 


CODDER: These data contain the percentage of reader attention (PERCENT) in a certain 
geographical area (LOCUS$) for the local newspaper. 


COFFEE? Hand et al.(1996). The data set contains the prices (in pence) of a 100gm pack of a 
particular brand of instant coffee, on sale in 15 different shops and amount (in gm) per pence 
in Milton Keynes on the same day in 1981. The variables are — PRICE, GM_PER_PENCE. 


COLAS” Schiffman, Reynolds, and Young (1981). These data consist of judgments by 10 subjects 
of the dissimilarity (0-100) between pairs of colas, including DIETPEPS, RC, YUKON, 
PEPPER, SHASTA, COKE, DIETPEPR, TAB, PEPSI, and DIETRITE. 


COLOR: These data provide the proportions of RED, GREEN, and BLUE that will produce the 
color specified in COLORS. 

COLRPREF* The data set contains color preferences (RED, ORANGE, YELLOW, GREEN, 
BLUE) among 15 people (NAME3$) for five primary colors. 

COMBAT" Stouffer et al. (1950). This data set contains reports of fear symptoms by selected U.S. 


soldiers after being withdrawn from World War II combat. Nine symptoms are included for 
analysis, and the number of soldiers in each profile of symptom is reported. The variables are- 


COUNT Number of soldiers in each profile of symptom 
POUNDING Violent pounding of the heart 

SINKING Sinking feeling in the stomach 

SHAKING Shaking or trembling all over 

NAUSEOUS Feeling sick to the stomach 


STIFF Cold sweat 

FAINT Feeling of weakness or feeling faint 
VOMIT Vomiting 

BOWELS Loss of bowel control 

URINE Loss of urinary control 


COMFORT? Milliken and Johnson (1992). In an experiment the effects of temperature on the comfort 
level of 18 men and 18 women was carried out using nine environmental chambers. Three different 
temperatures (65F, 70F and 75F) were assigned to three randomly selected chambers. Two 
randomly selected men and two randomly selected women were assigned to each chamber. The 
comfort of each person was measured after three hours in a scale of 1 to 15, where 1= cold, 8= 
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comfortable and 15= hot. The variables are - TEMP, GENDER, PERSON, CHAMBER, 
COMFORT. 

COMPUTER” Montgomery (2001). The following data represent the results of inspecting all units of 
a personal computer produced for 10 consecutive days (DAY). UNITS are the number of computers 
inspected each day, and NONCON is the number of nonconforming units found. 


CONDENSE" Messina (1987). The data file contains nonconformance data (defects) for 15 lots of 
condensers. LOT$ is lot number, TYPES is type of defect, and TALLY is the frequency of a 
particular defect in a particular lot. One thousand condensers were inspected in each lot. 


COVARe Winer (1971). Winer uses this artificial data set in an analysis of covariance in which Y 
is the dependent variable, X is the covariate, and TREAT is the treatment. 


COVSTRUCT* It is a hypothetical data. The variables are-P, Q, Y. 


COX¢ Cox (1970). These data record tests for failures among objects after certain times ( TIME). 
FAILURE is the number of failures, and COUNT is the total number of tests. 


CRABS¢ Wilkinson (2005). These data record the location of 23 fiddler-crab holes in an 80 x 80 
centimeter area of the Pamet River marsh in Truro, Massachusetts. The variables are- 
CRAB,X,Y 

CRIMERW¢ Clausen (1998). These data show the information case-by-case about crimes in three 
different areas in Norway. The following is a list of the three different areas and three crimes. 
The SYSTAT names are within parentheses. 


PLACES CRIME$ 
Mid Norrway (Mid N) Burglary 
North Norway (NorthN) Fraud 
Oslo Area (Oslo) Vandalism 


CRIMESTAT® FBI Uniform Crime Reports (1985). The data set consists of arrests by sex for 
selected crimes in United States in 1985. The variables are - CRIME$, MALES, FEMALES. 


CROPS" Milliken and Johnson (1984). It is an agricultural data consists of yields in pounds 
(YIELD) of two varieties of wheat(VARIETY) grown in four different fertility regimes (FERT). 
To compare four fertilizers and two varieties of crops, four whole plots were grouped into two 
blocks (BLOCK). The two varieties were assigned randomly to the two whole plots in each 
group. Each whole plot is split into four subplots, and the four fertilizers are applied randomly 
to these. 

DAYCREDM¢- Wilkinson, Blank, and Gruber (1996). This data set consists of three measures of 
achild’s social competence, including a measure for behavior at dinner, a measure for 
behavior in dealing with strangers, and one involving social problem solving in a cognitive 
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test. In addition, there is a categorical variable for the setting in which a child was raised, either 
by parents, by a babysitter, or by a daycare center. The variables are- 


SETTINGS Daycare setting in which child is raised 
SETTING Coded setting 

DINNER Behavioral measure of skill during dinner 
STRANGER Measure of skill in dealing with a stranger 
PROBLEM Social problem-solving skills in a cognitive test 


DELTIME® Montgomery, Peck, and Vining ( 2001). The data set deals with 25 delivery times of 
vending machines . The delivery time (DELTIME) of these machines is affected by the 
number of cases of product stocked (CASES) and the distance walked by the route driver 
(DISTANCE). 


DESIGNDM¢ Devor, Chang, and Sutherland (1992). The data set consists of the results of an 
experiment designed to improve the performance of a fuel gauge.The variables are- 


RUN The case ID 

SPRING Dummy variable for the type of spring used 

POINTER Dummy variable for the type of pointer used 

VENDOR Dummy variable for the vendor used 

ANGLE Dummy variable for the type of angle bracket used 
READING The reading of the fuel gauge under the designed conditions 


DIVORCE? Wilkinson, Blank, and Gruber (1996) and originally from Long (1971). This data set 
includes grounds for divorce in the United States in 1971. 


DOPTIMAL+* Myers and Montgomery (2002). The data set is from an experiment based on a D- 
optimal design on adhesive bonding where the factors are amount of adhesive (X/) and cure 
temperature (X2). Here the response is the pull-off force (Y). 


DOSE” These data are from a toxicity study for a drug designed to combat tumors. The data show 
the proportion of laboratory rats dying (RESPONSE) at each dose level (DOSE) of the drug. 
LOGDOS, dose in natural logarithm units. 


ECLIPSE” These data are from the National Aeronautics and Space Administration web site and | 
represent the longitude and latitude for the paths of eight future solar eclipses. Measurements 
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occur at two minute intervals. The data are used courtesy of Fred Espenak, NASA/GSFC. The 


variables are- 
MAPNUM 
TIME$ 


MAXLAT 
MAXLON 
MINLAT 
MINLON 
LABLAT 
LABLON 
RATIO 

ALT 
AZIMUTH 
WIDTH 
TOTALITY$ 
AUG_11_1999 
JUN_21_2001 
DEC_14_2001 
JUN_10_2002 
DEC_4_2002 
MAY_31_2003 
APR_8_2005 
OCT_3_2005 
LABEL$ 


ID number 


m in universal time at which eclipse will begin at the Latitude/Longitude for 
that case 


Northernmost latitude of total obstruction 
Northernmost longitude of total obstruction 
Southernmost latitude of total obstruction 
Southernmost longitude of total obstruction 
Center latitude of total obstruction 

Center longitude of total obstruction 

Ratio of diameters of the Moon and the Sun 
Altitude above horizon at the given Latitude/Longitude 
Azimuth at which eclipse will occur 

Width of the path of total obstruction 

Time period of total obstruction at centerline 
Indicator for ellipse beginning on this date. 
Indicator for ellipse beginning on this date. 
Indicator for ellipse beginning on this date. 
Indicator for ellipse beginning on this date. 
Indicator for ellipse beginning on this date. 
Indicator for ellipse beginning on this date. 
Indicator for ellipse beginning on this date. 
Indicator for ellipse beginning on this date. 
Variable used for labeling eclipses on graphs 


EDUCATN¢ This data set is a subset of the data set SURVEY2. 


EGGS Bliss (1967). An experiment was conducted to test the performance of laboratories and 
technicians to determine the fat content of dried eggs. A single can of dried eggs was stirred 
well. Samples were drawn and a pair of samples (claimed to be of two "types"), was sent to 
each of six commercial laboratories to be analyzed for fat content. Each laboratory assigned 
two technicians, who each analyzed both "types". The variables are- 


FAT 

LAB 
TECHNICIAN 
SAMPLE 


Fat content as a percentage 
Lab which ran the experiment 
Technician code 

Sample type used 
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EGYPTDM¢e Thomson and Randall-Maciver (1905). This data set consists of four measurements 
of male Egyptian skulls from five different time periods ranging from 4000 B.C. to 150 A.D. 
The four measurements of male Egyptian skulls are — 


MB Maximal breadth of skull 
BH Basibregmatic height of skull 
BL Basialveolar length of skull 
NH Nasal height of skull 

YEAR Time of measurement 


EKMAN¢® Ekman (1954). These data are judged for similarities among 14 different spectral colors. 
The variable names are the colors’ wavelengths W584, W600, W610, W628, W651, 
W434, W445, W465, W472, W490, W504, W537,W55 and W674. The judgments are averaged 
across 31 subjects. 


ELECSORT? This data set is obtained by sorting the data file ELECTION, by variable NAME$. 


EMF* The data set consists of counts emfs of patients in urban and suburban areas affected by 
cancer or not. The variables are - CANCER$, EMF$, RESIDENCE$, COUNT. 


ENERGY” SYSTAT created this file to demonstrate error bars. The variable SE determines the 
length of the error bar. ENERGY$ is determined as low, medium, and high. 


ENZYMDM¢ Greco et al. (1982). The data set consists of measurements of an enzymatic reaction 
measuring the effects on an inhibitor on the reaction velocity of an enzyme and substrate. 


ENZYME” Greco et al. (1982). These data measure competitive inhibition for an enzyme 
inhibitor. V is the initial enzyme velocity, S is the concentration of the substrate, and J is the 
concentration of the inhibitor. 


ESTIM¢ The data set consists of the estimated parameters for each sample of the data set 
ENZYMDM. 


EURONEW*‘ A subset of the WORLD data. These data include 27 European countries. The 
variable LABLAT is the latitude measurement of the capital, and LABLON is the longitude. 


EX1° Wheaton, Muthén, Alwin, and Summers (1977). The data file is a covariance matrix of 6 
manifest variables. The original data are attitude scales administered to 932 individuals in 
1967 and 1971. The attitude scales measure anomia (ANOMIA), powerlessness (POWRLS), 
and alienation (ALNTN). They also include a variable for socioeconomic index (SEI), 
socioeconomic status (SES), and years of schooling completed (EDUCTN). 


EX2¢ Duncan, Haller, and Portes (1971). The data is a correlation matrix of manifest variables. 
The original data measure peer influences on ambition. These data include the respondent’s 
parental aspiration (REPARASP), socioeconomic status (RESOCIEC), intelligence 
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(REINTGCE), occupational aspiration (REOCCASP), and educational aspiration 
(REEDASP). These data also include the respondent’ s best friend’s intelligence (BFINTGCE), 
socioeconomic status (BFSOCIEC), parental aspiration (BFPARASP), occupational 
aspiration (BFOCCASP), and ambition (BFAMBITN). 


EX3¢ Mels and Koorts (1989). These data are taken from a job satisfaction survey of 213 nurses. 
There are 10 manifest variables that serve as indicators of four latent variables: job security 
(JOBSEC), attitude toward training (TRAING), opportunities for promotion (PROMOT), and 
relations with superiors (RELSUP). 


EX4A and EX4Be Lawley and Maxwell (1971). These data comprise a correlation matrix of nine 
ability tests administered to 72 children. 


EXER?” The data consist of people who were randomly assigned to two different diets (DIET) low- 
fat and not low-fat and three different types of exercise (EXERTYPE) at rest, walking leisurely 
and running. A baseline pulse measurement (PULSE) was obtained at time = 0 for every 
individual in the study. However, subsequent pulse measurements were taken at less regular 
time intervals. The second pulse measurements were taken at approximately 2 minutes (time 
= 120 seconds); the third pulse measurement was obtained at approximately 5 minutes (time 
= 300 seconds); and the fourth and final pulse measurement was obtained at approximately 10 
minutes (time = 600 seconds). 


EXPORTS « Hand, Daly, Lunn, McConway, and Ostrowski (1994). This data set consists of the 
value (in millions of £) of British exports (EXPORTS) during the years 1820 to 1850 (YEAR) . 


FLEA” Lubischew (1962). The data set consists of measurements on the following four variables 
on two species (SPECIES) of flea beetles: 


XI Distance of the transverse groove to the posterior border of the paradox (in microns) 
X2 Length of the elytra (in mm) 

X3 Length of the second antennal point (in microns) 

X4 Length of the third antennal joint. (in microns) 


FLEABEETLE® Hand et al. (1996). Data were collected on the genus of flea beetle Chaetocnema, 
which contains three species (SPECIES$): concinna (Con), heikertingeri (Hei), and 
heptapotamica (Hep). Measurements were made on the width and angle of the aedeagus of 74 
beetles. The goal of the original study was to form a classification rule to distinguish the three 
species. The data set consists of only measurements of angle of aedeagus of beetles. The 
variables are - ANGLE, SPECIES$. 

FOOD» These data were gathered from food labels at a grocery store. The variables are- 


BRAND$ Shortened name for brand 
FOOD$ Type of dinner: chicken, pasta, or beef 
CALORIES Calories per serving 
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FAT Grams of fat 
PROTEIN Grams of protein 
VITAMINA, CALCIUM, IRON Percentage of daily value of vitamin A, calcium, and iron 
COST Price per dinner 
DIET$ Yes if low in calories; no if standard 


FOREARMI- Pearson and Lee (1903). The data set consists of ARMLENGH, that is length of 
forearm (in inches) of 140 men. 

FOSSILS” The data give the incidence of fossil specimens of various flora found at various 
elevations of a site in British Columbia. The variables are — HEIGHT, CHARA, NITALLA, 
JUNCUS, RUMEX. 


FRACTION These data are from a half of a 24 factorial design. Each cell contains two 
observations on a Y variable 

FRTFLYDM¢e Carey, Liedo, Orozco, and Vaupel (1992). This data set contains information on 
mortality rates for Mediterranean fruit flies over 172 days, after which all flies were dead. 
Experimenters recorded the number of flies dying each day (DAY) and divided this by the 
number alive (LIVING) at the beginning of the day to measure mortality rate (MORTRATE) 
for each day. 


GAUGEI: Smith (2001). The data set consists of repeated measurements (READING) of a 
characteristic of ten items (ITEM), each by three persons (PERSON). 


GAUGE2¢ Montgomery and Runger (1993). Three operators measure a quality characteristic on 
twenty units twice each. 

GDP» The data set consists of CSO’s quarterly estimates of growth rates of GDP for 1996-1997 
to 2004-2005 for the following eight sectors. The variables are — YEAR$, AGRICULTURE, 
MINING, MANUFACTURE, ELECTRICITY, CONSTRUCTION, TRADE, FINANCING, 
COMMUNITY, OVERALL-GDP. 


GDWTRDM¢ Nichols, Kane, Browning, and Cagle (1976). The U.S. Department of Energy 
collected samples of groundwater in West Texas as part of a project to estimate U.S. uranium 
reserves, Samples were taken from five different locations called producing horizons, and 
then measured for various chemical components. In addition, the latitude and longitude for 
each sample location was recorded. The variables are- 


SAMPLE The ID of the groundwater sample 
LATITUDE Latitude at which the sample was taken 
LONGTUDE Longitude at which the sample was taken 
HORIZONS Initials of producing horizon 

HORIZON ID of producing horizon 


URANIUM Uranium level in groundwater 
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ARSENIC Arsenic level in groundwater 
BORON Boron level in groundwater 
BARIUM Barium level in groundwater 
MOLYBDEN Molybdenum level in groundwater 
SELENIUM Selenium level in groundwater 
VANADIUM Vanadium level in groundwater 
SULFATE Sulfate level in groundwater 
TOT_ALK Alkalinity of groundwater 
BICARBON Bicarbonate level in groundwater 
CONDUCT Conductivity of groundwater 

PH pH of groundwater 

URANLOG Log of uranium level in groundwater 
MOLYLOG Log of molybdenum level in groundwater 


GRADES” The variables in this data set are marks in four quiz (QUIZI, QUIZ2, QUIZ3, QUIZ4) 
of six students (VAME$) and their marks in MIDTERM and FINAL exams. 


GROWTH Each case in this file represents a group of plants receiving the same dose (DOSE) of 
a growth hormone. GROWTH is the mean growth measure for each group, and SE is the 
standard error of the mean. 


HARDDIA¢? Taguchi (1989). The data set consists of measurements on 20 units of two 
characteristics of a product: Brinell hardness number (BHN) and circular diameter 
(DIAMETER). 

HEAD Frets (1921). The data consists of measurements on the following characteristics of two 
sons of 25 families. The variables are- 


HLENI Head length of the first son 
HBREAD1I Head breadth of the first son 
HLEN2 Head length of the second son 


HBREAD2 Head breadth of the second son 


HEADDIM¢ Flury and Riedwyl (1988).These data are measurements of two hundred 20 year old 
male Swiss army personnel on the following characteristics: 


MFB Minimal frontal breadth 
BAM Breadth of angulus mandibulae 
TFH True facial height 


LGAN Length from glabella to apex nasi 
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LTN Length from tragion to nasion 
LTG Length from tragion to gnathion 


HEART” DASL (2005). An experiment was conducted by students at The Ohio State University 
in the fall of 1993 to explore the relationship between a person's heart rate and the frequency 
at which that person stepped up and down on steps of various heights. The response variable, 
heart rate, was measured in beats per minute. There were two different step heights: 5.75 
inches (coded as 0), and 11.5 inches (coded as 1). There were three rates of stepping: 14 
steps/min. (coded as 0), 21 steps/min. (coded as 1), and 28 steps/min. (coded as 2). This 
resulted in six possible height/frequency combinations. Each subject performed the activity 
for three minutes. Subjects were kept on pace by the beat of an electric metronome. One 
experimenter counted the subject's pulse for 20 seconds before and after each trial. The subject 
always rested between trials until her or his heart rate returned to close to the beginning rate. 
Another experimenter kept track of the time spent stepping. Each subject was always 
measured and timed by the same pair of experimenters to reduce variability in the experiment. 
Each pair of experimenters was treated as a block. The variables are - 


ORDER The overall performance order of the trial 

BLOCK The subject and experimenters’ block number 

HEIGHT 0 if step at the low (5.75") height, | if at the high (11.5") height 

FREQUENCY The rate of stepping, 0 if slow (14 steps/min), 1 if medium (21 steps/min), 2 if high 


(28 steps/min) 
RESTHR The resting heart rate of the subject before a trial, in beats per minute 
HR The final heart rate of the subject after a trial, in beats per minute 


HELM” Helm (1959), reprinted by Borg and Lingoes (1987). These data contain highly accurate 
estimates of “distance” between color pairs by one experimental subject (CB). Variables 
include A, C, E, G, I, K, M, O, Q, and S. 


HILLRACE” Atkinson (1986). The data set gives the record-winning times (TIME) for 35 hill 
races (RACES$) in Scotland. The distance (DISTANCE) travelled and the height climbed 
(CLIMB) in each race are also given. The variables are- 


RACES Name of the Race 

DISTANCE Distance covered in miles 

CLIMB Elevation climbed during race in feet 
TIME Record time for race in minutes 


HILO” These are hypothetical price data for a stock. HIGH is the highest price for that month 
(MONTH and MONTH$), LOWis the low price, and CLOSE is the closing price at the end of 
the month. 
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HISTAMINE" Morrison and Zeppa (1963). It consists of data having a multivariate layout. In this 
study, mongrel dogs were divided into four groups of four. The groups received different drug 
treatments. The dependent variable, blood histamine in mg/ml, was measured at four times 
HISTAMINE], HISTAMINE2, HISTAMINE3 and HISTAMINE4 after administration of the 
drug. The data are incomplete, since one of the dogs is missing in the last measurement. 


HOSLEM¢ Hosmer and Lemeshow (2000). The variables are- 


ID 
LOW 
AGE 
LWT 
RACE 
SMOKE 
PTL 
HT 
UI 
FTV 
BWT 


Identification Code 

Low infant birth weight 

Mother’s age 

Mother’s weight during last menstrual period 
1= white, 2= black, 3= other 

Smoking status during pregnancy 

History of premature labor 

Hypertension 

Uterine irritability 

Number of physician visits during first trimester 
Birth weight 


HOSLEMMé Hosmer and Lemeshow (2000). It already exists in SYSTAT as HOSLEM. Four new 
variables are added to it, which are fictitious: The variables are- 


SETSIZE 
GROUP 
REC 
DEPVAR 


The number of subjects in each strata (which is AGE for this analysis) 
Identity number of strata. 


Case number. 
The relative position of the case in a given matched set. 


HW¢ It is a hypothetical data of height and weight of a group of people according to gender. 


ILEA+ Goldstein (1987). It is a subset of data from the Inner London Education Authority (ILEA). 
The data consists of information about 2069 students within 96 schools. The variables are- 


ACH Measures of achievement 


PFSM 


The percent of students within each school who are eligible to participate in a free meal 


program 
VRA A verbal reasoning ability level from 1 to 3 


INCOME The data here were collected from a class of students. There are two variables. 
SCORES] represents the percent score of students in a statistics test and INCOME the monthly 


family income in thousand dollars. 
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INSTRDM Huitema. (1980). This data set consists of measures of achievement on a biology 
exam for two groups of students. One group was simply told to study everything from a 
biology text in general, and the other was given terms and concepts that they were expected 
to master. An additional covariate, the student’s aptitude, is also included in the data set. The 
variables are- 


STUDENT Student ID 

INSTRUCT$ Type of instruction given 
INSTRUCT Coded variable for INSTRUCTS 
APTITUDE Student’s underlying ability to learn 
ACHIEVE Student’s score on the exam 


IRIS* Anderson (1935), These data measure sepal length (SEPALLEN), sepal width (SEPALWID), 
petal length (PETALLEN), and petal width (PETALWID) in centimeters for three species 
(SPECIES) of irises (1=Setosa, 2=Versicolor, and 3=Virginica). 


JOHN: John (1971). These data are from an incomplete block design with three treatment factors 
(A, B, and C), a blocking variable with eight levels (BLOCK), and the dependent variable (Y). 


JUICE® Montgomery (2001). The number of defective orange juice cans (DEFECTS) found in 
each of 24 samples (SAMPLE) of 50 juice cans, Data are collected on each of three shifts 
(TIME$) with eight samples taken for each shift (SHIFTS). SIZE is also a variable, 


JUICE1* Montgomery (2001). The following fictitious variable has been added to JUICE, 


The number of defective orange juice cans found in each of 24 samples (SAMPLE) 
of 50 juice cans 


DEFECTS1 
KENTON" Neter, Kutner, Nachtsheim, and Wasserman (1996), These data comprise of unit sales 
of a product (SALES) under different types of package designs (PACKAGE). Each case 

represents a different store, 


KOOIJMAN? Kooijman (1979), reprinted in Upton and Fingleton (1990), The data consist of the 
locations of beadlet anemones (Actinia equina) on the surface of a boulder at Quiberon Island, 
off the Brittany coast, in May 1976. 


KUEHL. Kuehl (2000). The original data source is Dr. S. Denise, Department of Animal Sciences, 
University of Arizona. A genetic study with beef animals consisted of several sires each mated 
to a separate group of dams. The matings that resulted in male progeny calves were used for 
an inheritance study of birth weights. The birth weights of eight male calves in each of five 
sire groups are given. The variables are - SIRE, BIRTHW, PROGENY, and GR. 
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LABe Jackson (1991). The data set consists of four bivariate vector observations per laboratory 
Samples were tested in three different laboratories (LAB) using two different methods 
(METHOD1, METHOD?) and each LAB received four samples. 

LABOR? U.S. Bureau of Labor Statistics. These data show output productivity per labor hour in 
1977 U.S. dollars for a 25-year period (YEAR). Other variables are US, CANADA, JAPAN, and 
GERMANY and ENGLAND. 

LATIN" Neter, Kutner, Nachtsheim and Wasserman (1996). These data are from a Latin square 
design in which the response (RESPONSE) in each square (SQUARE) is from one of five days 
a week (DAY) for five weeks (WEEK). 

LAW¢ Efron and Tibshirani (1993). The law school data. A random sample of size 15 was taken 
from the universe of 82 USA law schools. Two variables are- average score on a national law 
test (LSAT) and average undergraduate grade-point average (GPA). 

LEAD» Ott and Longnecker (2001). The data set consists of lead concentrations (mg/kg dry 
weight) of 37 stations in Kenya, obtained from a geo-chemical and oceanographic survey of 
inshore waters of Mombasa, Kenya. 

LEARN” Gilfoil (1982). These data demonstrate a quadratic function with a ceiling. They are from 
a study showing that inexperienced computer users prefer dialog menu interfaces while 
experienced users prefer command-based interfaces. SESSION is the session number, and 
TASKS is the number of command-based (as opposed to dialog-based) tasks initiated by the 
user during that session. 

LEISURE” Clausen (1998). These data show a cross-classification between different leisure 
activities and different occupational status. The following is a list of the different activities and 
occupational status. The SYSTAT names are within parentheses. 


Activities Occupational Status 

Sports Events (Sports) Manual (MANUAL) 

Cinema (Cinema) Low Non Manual (LOWNM) 
Dance/Disco (Dance) High Non Manual (H/GHNM) 
Cafe/Restaurant (Cafe) Farmer (FRAMER) 

Theatre (Theatre) Student (STUDENT) 

Art Exhibition (Art) Retired (RETIRED) 

Library (Library) 

Church Service (Church) 

Classical Music (Classical) 

Pop (Pop) 


LIFE” The data are lifetimes (LIFE) of 20 units of a certain equipment. 
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LONGLEY* Longley (1967). These data are economic data selected by Longley to illustrate 
computational shortcomings of statistical software. The variables are - DEFLATOR, GNP, 
UNEMPLOY, ARMFORCE, POPULATN, TIME, and TOTAL. 


LUNGDISe Hand, Daly, Lunn, McConway, and Ostrowski (1996). This data set consists of 
monthly (MONTH$) deaths (DEATHS) from lung diseases in the UK during the years (YEAR) 
1974 to 1979. 


MACHINE” These data are in the file MACHINE and represent the numbers (N) of conforming 
(RESULT is 1) and nonconforming (RESULT is 0) units produced by each of five machines. 


MACHINE}: Milliken and Johnson (1992). An experiment was conducted by a company to 
compare the performances of three different brands of machines when operated by the 
company's own personnel. Six employees were selected at random and each of them had to 
operate each machine three different times. The data set consists of overall scores that take 
into account both the quantity and quality of the output. The variables are - SCORE, 
MACHINE, OPERATOR and TIME. 


MACHINE?2° Milliken and Johnson (1992). It is an unbalanced data set where two machines were 
operated by six randomly selected operators. Each operator was allowed to operate each 
machine at most three times. 

MACKe Breslow and Day (1980). The data deals with the cases of eudiometrical cancer in a 


retirement community near Los Angeles. The data are reproduced in their Appendix III. The 
variables are-. 


CANCER 

AGE 

GALL Gallbladder disease 

HYP Hypertension 

OBESE Obesity 

EST Estrogen 

DOS Dose 

DUR Duration of conjugated estrogen exposure 
NON Other drugs 


The data are organized by sets, with the case coming first, followed by four controls, and so on, 
for a total of 315 observations (63 * (4 + 1)). 


MANOVA” Morrison (1990). These data are from a hypothetical experiment measuring weight 
loss in rats. Each rat was assigned randomly to one of three drugs (DRUG), with weight loss 
measured in grams for the first and second weeks of the experiment ( WEEK(1) and WEEK(2)). 
SEX was another factor. 
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MELNMADM¢E Wilkinson and Engelman (1996). This data set contains reports on melanoma 
patients. The variables are- 


TIME The survival time for melanoma patients in days 
CENSOR The censoring variable 

WEIGHT The weight variable 

ULCER Presence or absence of ulcers 

DEPTH Depth of ulceration 

NODES Number of lymph nodes that are affected 

SEX$ The sex of the patient 

SEX The stratification variable coded for analysis 


METOX¢ Fellner (1986). The data set is about metallic oxide analysis where two types of metallic 
oxides, eighteen lots from the first type, and thirteen from the second were used. Two samples 
were drawn from each lot. A pair of chemists was randomly selected for each sample. The 
variables are - TYPE, SAMPLE, CHEMIST and Y. 


MILK" Brownlee (1960). The data set pertains to bacteriological testing of milk. Twelve milk 
samples (SAMPLE) were tested in all six combinations of two types of bottles (BOTTLE$) and 
three types of tubes (TUBES). Ten tests were run on each combination and the response was 
the number of positive tests in each set of ten (Y). 


MINIWRLD» This data file is a subset of OURWORLD. 


MINTEMP: Barnett and Lewis (1967). The data set consists of a variable TEMP that is annual 
minimum temperature (F) of Plymouth (in Britain) for 49 years. 


MISSLES¢ Jackson (1991). These data are a covariance matrix of measures performed on 40 Nike 
rockets. The variables are: INTEGRAI, PLANMTRI, INTEGRA2, and PLANMTR2. 


MJ006¢ Milliken and Johnson (1984). This data set came from an experiment that was conducted 
to determine how six different kinds of work tasks (TASK) affect a worker's pulse rate. In this 
experiment, 78 male workers were assigned at random to six different groups so that there 
were 13 workers in each group. Each group of workers was trained to perform its assigned 
task. On a selected day after training, the pulse rates (PULSE) of the workers were measured 
after the workers had performed their assigned tasks for one hour. Unfortunately some 
individuals withdrew from the experiment during the training process so that some groups 
contained fewer than 13 individuals. The recorded data represent the number of pulsations in 


20 seconds. 


MJ020¢ Milliken and Johnson (1984).The data set is from a paired association learning task 
experiment performed on subjects under the influence of two drugs. Group] is a control (no 
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drug), Group2 was given drugl, Group3 was given drug2 and Group4 was given both drugs. 
The variables are - LEARNING and GROUP. 


MJ129 Milliken and Johnson (1984). The data set is from a small two-way treatment structure 
experiment conducted in a completely randomized design structure. 


MJ166¢ Milliken and Johnson (1984). A bakery scientist wanted to study the effects of combining 
three different fats (FAT) with each of three different surfactants (SURF) on the specific 
volume of bread loaves (SPVOL) baked from doughs mixed from each of the nine treatment 
combinations. Four flours (FLOUR) of the same type but from different sources were used as 
blocking factors. That is, loaves were made using all nine treatment combinations for each of 
the four flours. 


MJ173¢ Milliken and Johnson (1984). This is a hypothetical data set from a two-way treatment 
structure in a completely randomized design with treatment T and treatment B each having 
three levels. 


MJ202¢ Milliken and Johnson (1984). These data are from a home economics survey experiment. 
DIFF is the change in test scores between pre-test and post-test on a nutritional knowledge 
questionnaire. GROUP classifies whether or not a subject received food stamps. AGE 
designates four age groups, and RACE$ designates whites, blacks, and Hispanics. 


MJ332¢ Milliken and Johnson (1984). An experiment involved 3 drugs to study the effect of each 
drug on heart rate of eight persons in four time periods. The variables are- PERSON, HR, 
DRUG, TIME. 


MJ338° Milliken and Johnson (1984). An engineer had three environments in which to test three 
types of clothing. Four people (two males and two females) were put into an environmental 
chamber (each one was assigned one of the three environments). One male and one female 
wore clothing type 1, and the other male and female wore clothing type 2. The comfort score 
of each person was recorded at the end of one hour (SCORE(1)), two hours (SCORE(2)), and 
three hours (SCORE(3)). 


MJ379¢ Milliken and Johnson (1984). An experimenter wanted to study the effects of three 
different herbicides (HERB) and four fertilizers (FERT) on the growth rate of corn. Fifteen 
plots of land (PLOT) were available for the experiment, and 5 plots were randomly assigned 
to each of the three herbicides. Each of the 15 plots were further divided into 4 subplots, and 
a different fertilizer treatment was randomly assigned to each. At the beginning of the third 
week, 10 plants were selected at random from each subplot And the height of each plant was 
measured. The average of the 10 heights (HEIGHT) was recorded as the measurement from 
the subplot. Unfortunately, before any measurement could be taken, 3 of the 15 whole plots 
were destroyed by excessive rainfall. Herbicide 1 had been assigned to two of those subplots 
and herbicide 3 to the third 


373 
Data Files 


MJ385¢ Milliken and Johnson (1984). These data form a small part of an experiment conducted to 
determine the effects of a drug on the scores obtained by depressed patients on a test to 
measure depression. Two patients were in the placebo group, and three in the drug group. The 
variables are- SCORE, WEEK, PATIENT, TREAT$. 


MOTHERS” Morrison (1990). These data are hypothetical profiles on three scales of mothers 
(SCALE(1) to SCALE(3)) in each of four socioeconomic classes (CLASS). Other variables are 
A$, B$, C$, A, B, and C. 


MRCURYDM¢€ Lange et al. (1993). The data set consists of measurements of large-mouth bass in 
53 different Florida lakes to examine the factors that influence the level of mercury 
contamination. Water samples were collected from which the pH level, the amount of 
chlorophyll, calcium, and alkalinity were measured. A sample of fish was taken from each 
lake, for which the age of each fish and mercury concentration in the muscle tissue was 
measured (older fish tend to have higher concentrations). To make a fair comparison of the 
fish in different lakes, the investigators used a regression estimate of the expected mercury 
concentration in a three-year-old fish as the standardized value for each lake. Finally, in 10 of 
the 53 lakes, the age of the individual fish could not be determined and the average mercury 
concentration of the sampled fish was used. The variables are-. 


ID Lake ID 

LAKES Lake name 

ALKLNTY Measured alkalinity of the lake (mg/L as Calcium Carbonate) 
PH Measured PH of the lake 


CALCIUM Measured Calcium of the lake (mg/l) 
CHLORO Measured Chlorophyll of the lake (mg/l) 


Average mercury concentration (parts per million) in the tissue of 
AUGMERS the fish sampled from the lake 


SAMPLES Number of fish sampled in the lake 
MIN Minimum mercury concentration in sampled fish from lake 
MAX Maximum mercury concentration in sampled fish from lake 


Regression estimate of the mercury concentration in a 3 year old 
STOMER. fish from the lake 


AGEDATA Indicator of the availability of age data on fish sampled 
LNCHLORO Log of CHLORO 


MULTIRESP Myers & Montgomery (2002). This data set contains observations on three 
responses at different level combinations of two factors, time (TIME) and temperature 
(TEMP) of achemical process. The three responses are yield (YIELD), viscosity (VISCOSITY) 
and the number-average molecular weight (MOLWEIGHT). The data set also contains coded 
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versions of these variables. X7 describes the TIME variables after being used coded, and X2 
describes TEMP after being coded. 

NAFTA” Two months before the North Atlantic Federal Trade Agreement approval and before the 
televised debate between Vice President Al Gore and businessman Ross Perot, political 
pollsters queried a sample of 350 people, asking “Are you For, Unsure, or Against NAFTA?” 
After the debate, the pollsters contacted the same people and asked the question a second time. 
Variables include BEFORE$, AFTER$, and COUNT. 

NEWARK” Collected by the U.S. Government and cited in Chambers, et al. (1983). These data 
are 64 average monthly temperatures (TEMP) in Newark, New Jersey, beginning with 
January, 1964. 

NFL"! Johnson (1999). The data set is obtained from the NFL for the 1999-2000 season for those 
players with at least 1,500 passing attempts. It is NFL Passer Rating Data. RATING is based 
on performance standards established for completion percentage, average gain, touchdown 
percentage, and interception percentage. The variables are: 


NAME$ Last name and first name of Quaterback 
ATTEMPTS Passing attempts 

COMPLETIONS Percentage of completions per attempt 
YARDS Average yards gained per attempt 

TDS Percentage of touchdown passes per attempt 
INTS Percentage of interceptions per attempt 
RATING NFL Ratings (rounded to the nearest 0.1) 


NLS¢+ The data used here have been extracted from the National Longitudinal Survey of Young 
Men (1979), containing information on 200 individuals on school enrollment. 


NOTENR School Enrollment Status (1 if not enrolled, 0 otherwise) 

BLACK A race dummy (0 for white) 

SOUTH A region dummy (0 for non-South) 

EDUC Highest completed grade 

AGE Age 

FED Father’s education 

MED Mother’s education 

CULTURE An index of reading material available in the home (1 for least, 3 for most) 
NSIBS Number of siblings 


LW Log10 of wage 
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IO An IQ measure 
FOMY Mean income of persons in father’s occupation in 1960 


OPERA” The following data are from an editorial in The New York Times (December 3, 1987). 
They represent the duration (HOURS) of various plays, films, and operas (T/TLE$). 


OURWORLD¢ Variables recorded for each case (country) include: 


COUNTRY$ 

URBAN 

LIFEEXPF, LIFEEXPM 
GDP$ 

GDP_CAP 


BABYMORT, BABYMT82 


BIRTH_RT 
DEATH_RT 
BIRTH_82, DEATH_82 
B_TO_D 


HEALTH, EDUC, MIL, 
HEALTH84, EDUC_84 
and MIL_84 


POP_1983, POP_1986, 
POP_1990, POP_2020 


GNP_82, GNP_86 
RELIGIONS 


GOV$ 
LEADER$ 
LITERACY 
GROUPS 
URBANS 
MCDONALD 
LAT, LON 
B_TO_D82 
LOG_GDP 
LIFE_EXP 


Names of the 95 countries used in this data file 
Percentage of population living in urban areas 

Years of life expectancy for females and males 

Group variable with codes “Developed” and “Emerging” 
Gross domestic product per capita in U.S. dollars 
BABYMORT = infant mortality rate for 1990; BABYMT82 = infant mor- 
tality rate in 1982 

Number of births per 1000 people in 1990 

Number of deaths per 1000 people in 1990 

Number of births and deaths per 1000 people in 1982 
Birth to death ratio in 1990 


Expenditures (in U.S. dollars) per person for health, education, and the 
military in 1990 and in 1984 


Populations in millions for the years 1983, 1986, and 1990; POP_2020 
is the population projected by the United Nations for 2020 
Gross national product in 1982 and 1986 


Expenditures grouped by the religion or personal philosophy of those 
who govern the country 


Type of government 

Religion of the leaders of countries 

Percentage of the population that can read 

Europe, Islamic, or the New World 

Rural or urban 

Number of McDonald’s restaurants per country 

Latitude and longitude measurements of the center of the country 
Birth to death ratio in 1982 

Log of gross domestic product per capita 

Years of life expectancy 


PAINTS” Milliken and Johnson (1992). The dataset consists of four different paints, Yellow 1, 
Yellow 2, White 1 and White2 that are manufactured by two different companies, where the 
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1 and 2 refer to the company. Each of the paint is applied on three different paving surfaces: 
Asphalt1, Asphalt! and Concrete. The response is the life time measured in weeks. In original 
data only the cell means and error sum of squares have been reported so the following data set 
has been generated artificially to have the same cell means and error sum of squares as the 
original data. The variables are - Y, PAINT$, PAVE$. 

PAROLE” Maltz (1984). These data record the number of Illinois parolees (COUNT) who failed 
conditions of their parole after a certain number of months (MONTH). An additional 149 
parolees failed after 22 months, but these are not used. 


PATMISS¢s Hocking (2003). In an experiment a pharmaceutical company was trying to test a new 
medicine. Three clinics were selected at random from a large number of clinics. The drug was 
administered to ten randomly selected patients. However, some of the measurements from 
some of the clinics have not been reported. The variables are - CLINIC and Y. 


PATTERN¢ Laner, Morris and Oldfield (1957). In a psychological experiment of visual 
perception, there were required 1555520 squares to color (either black with probability 0.29 
or white with probability 0.71). From this a total of 1000 non-overlapping samples each 
containing 16 of small squares were randomly selected, and the number of black squares were 
counted in each case. The data set consists of the frequency distribution of this count. 


PATTISON” Clarke (1987). In his 1987 JASA article, C. P. Y. Clarke discusses the data taken 
from an unpublished thesis by N. B. Pattinson for 13 grass samples collected in a pasture. 
Pattinson recorded the weeks since grazing began in the pasture (TIME) and the weight of 
grass cut from 10 randomly sited quadrants, then fit the Mitcherlitz equation: 


- 0, TIME 


GRASS = 0,+ 0, 


PDLEX1¢ Gujarati (1995). The data set relates to the SALES and INVENTORY of a product in 20 
days. 

PDLEX2¢ Gujarati (2003). The data set relates to the SALES and INVENTORY of a product for the 
United States for the period 1954-1999. 


PDLEX3¢ Gujarati (2003). The data set relates to income-money supply model of USA for the 
period 1970-1999. The variables are as follows: 


GDP Gross domestic product ($, billions, seasonally adjusted) 

M2 Money supply ($, billion, seasonally adjusted) 

GDPI Gross private domestic investment ($, billion, seasonally adjusted) 
FEDEXP Federal government expenditure ($, billion, seasonally adjusted) 
TBO Six-month treasury bill rate (%) 
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PESTICIDE” Milliken and Johnson (1992). Four chemical companies produce certain pesticides. 
Company A produces three such products, companies B and C produces two such products 
each, and company D produces four such products. No company produces a product exactly 
like that of another. The treatment structure is a two-way with COMPANY$ as one factor and 
PESTICIDE as the other. To compare these we use 33 glass containers that are randomly 
grouped into eleven groups of three. The pesticides are assigned randomly to the groups. The 
assigned pesticide is applied to the inside of each box in its group. A box with 400 mosquitoes 
and soil with bluegrass is put inside each container and the number of live mosquitoes in each 
box was counted after 4 hours (Y). 


PESTRESIDUE¢ Kuehl (2000). A comparison was made among two standard pesticide methods 
to compare and test the amount of residue left on cotton plant leaves is the same for the two 
methods (METHOD). To test these six batches (BATCH) of plants were sampled from the 
field. Two plants were used in the experiment from each batch. Thus, there were twelve plants 
in the experiment (SAMPLE). The plants inside each batch were from the same field plot. 
Method one was applied to three randomly selected batches, and the remaining three batches 
were given method two. The amounts of residue on the leaves were measured after a specified 
amount of time for each of the twelve plants (Y). 


PHONECAL> Rousseeuw and Leroy (1987). The data set, which comes from the Belgian 
Statistical survey, describes the number of international phone calls from Belgium in years 
1950-1973. The variables are: 


X Years 
Y Number of phone calls 


PHOSPHOR? Hocking (1985). The data set is about the concentration of phosphorus in the wash 
water. The aim of the investigation is to determine how the concentration varies with the types 
of detergent and washing machines. The experiment was carried out with four different types 
of detergents, three different types of machines, and seven laundromats. The laundromats had 
different numbers of machines, but each laundromat had only machines of a single type. Thus, 
laundromats are nested inside machine types. The machines within each laundromat were 
divided into four groups of roughly equal sizes, and the four types of detergent were allocated 
to them. The response is the average amount of phosphorus in grams per liter from daily one- 
hour samples over a seven day period. The variables are - Y, N, MA CHINE, LAUNDRY, 
DETERG 


PHYSICAL” Crowder and Hand (1990). The data set shows three groups of diabetic patients and 
one control group (GROUP). The response variable is observed at 12 time points and the 
corresponding variables are XJ, X2 & Y1 through Y/0, respectively. 
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PISTON" Taguchi,El Sayed, Hslang (1989). This data set consists of diameter differences (DIA) 
between the cylinder and the piston of a six-cylinder engine. The sample was selected from a 
month’s (MONTHS) production of an automobile manufacture unit. 


PLANKS" Netmaster Statistics Courses. After drying beech wood the humidity level at any given 
point inside a plank typically depends on the depth of the point. To study the relation between 
the humidity levels (measured as a percentage) the depth , and twenty different randomly 
selected beech planks were measured for humidity level at five depths and three widths. The 
variables are - PLANK, WIDTH, DEPTH and HUMIDITY. 


PLANTS¢ SYSTAT created this file to demonstrate regression with ecological or grouped data. 
The variables are: CO2, SPECIES, and COUNT. 


PLOTS” The split plot design is closely related to the nested design. In the split plot, however, 
plots are often considered a random factor. Thus, different error terms are constructed to test 
different effects. Here is an example involving two treatments: A (between plots) and B 
(within plots). The numbers in the cells are YIELD of the crop within plots. These data also 
use PLOT, PLOT(1), and PLOT(2) as variables. 


POLAR: These data show the highest frequency (FREQ) (in 1000’s of cycles per second) 
perceived by a subject listening to a constant amplitude sine wave generator oriented at 
various angles relative to the subject (ANGLE). 

POLYNOM. The following variables were created in SYSTAT using the equations 
X=uti-10 

Y=2 43*X + 4*X? + 5*X3 + 500*z 
where u is a uniform random variable, i is an index running from 1 to 20, and z is a standard 
normal random variable. The variable ESTIMATE was estimated from a cubic regression 
model. Finally, the variables UPPER and LOWER were computed. UPPER corresponds to 


two standard errors above the estimated value and LOWER corresponds to two standard errors 
below. 


POWER: Ott and Longnecker (2001). The data set consists of deviations from target power 
(POWER) using monomers from three different suppliers (SUPPLIER) with a total number of 
27 cases. 


PROCESS: Breyfogle (2003). The data set consists of the number of units checked and the number 
of defects found in 10 operations step in a production process. 
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PULPFIBER¢ Lee (1992). The data set contains 62 measurements on the properties of pulp fibers 
and the paper made from them. 


Four types of pulp fiber characteristics are: 


XI Arithmetic fiber length 
X2 Long fiber fraction 

X3 Fine fraction 

X4 Zero span tensile 


The four paper properties are: 


Yi Breaking length 
Y2 Elastic modulus 
Y3 Stress at failure 
Y4 Burst strength 


PUMPFAILURES¢ Gaver and O’Muircheartaigh (1987). The data set consists of the number of 
failures (F) and times of observation (T) for 10 pump systems at a nuclear power plant. 


PUNCH" Cornell (1985). These data measure the effects of various mixtures of watermelon 
(WATERMELN), pineapple (PINEAPPL), and orange juice (ORANGE) on taste ratings by 
judges (TASTE) of a fruit punch. 


QUAD» Cook and Weisberg (1990). The data set is from a function, which reaches its maximum 
at —b/2c; however, for the data given by Cook and Weisberg, this maximum is close to the 
smallest X. In other words, little of the response curve is found to the left of the maximum. 


QUAKES” The Open University (1981). The data set consists of TIME in days between successive 
serious earthquakes worldwide. 


RAINFALL» Lee (1989). This is a data set of December rainfall (Y) on November rainfall (X) from 
1971 to 1980. 

RANSAMPLE¢ The data set consists of 100 random observations on (X, Y, Z) where X follows the 
standard normal distribution, Y given X follows normal distribution with mean X and standard 
deviation 1, Z given (X,Y) follows normal distribution with mean X and Y and standard 
deviation 1. The data set is generated by using SYSTAT. 


RATGROWTH- Milliken and Johnson (1992). This experiment involved studying the effect of a 
dose of a drug on the growth of rats. The data set consists of the growth of fifty rats, where ten 
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rats were randomly assigned to each of the five doses of the drug. The weights were obtained 
each week for eleven weeks. The variables are - DOSE, RAT, WEEK, WEIGHT 


RATS” Morrison (1990). For these data, six rats were weighed at the end of each of five weeks 
(WEIGHT(1) to WEIGHT(5)). 


RCITY* Adapted from a Swiss Bank pamphlet: These data include 46 international cities (C/TY$), 
the name of continental region (REGIJON$), average working hours per week (WORKWEEK), 
working time (in minutes) to buy a hamburger and a large portion of french fries (BIG_MAC), 
average cost (in U.S. dollars per basket) of a basket of goods and services (LIVECOST), net 
hourly earnings (EARNINGS), and percentage of taxes security paid by worker (PCTTAXES). 


REACT” These data involve yields of a chemical reaction (YIELD) under various combinations of 
four binary factors (A, B, C, and D). Two reactions are observed under each combination of 
experimental factors, so the number of cases per cell is two. 


REGORTHO> The data set consists of 25 random observations on (X,Y) with X2 = X2, X3 = X’, 
X4=X* and X5=X°,where X follows normal distribution with mean 5 and standard deviation 
1,Y given X follows normal distribution with mean 1-X+X? and standard deviation 1.The data 
set is generated by using SYSTAT. The variables in this data set are X, Y, X2, X3, X4, X5. 


REPEAT» Winer (1971). These data contain two grouping factors (ANXIETY and T ENSION) 
and one trial factor (TRIAL(1) to TRIAL(4)). 


REPEAT2+ Winer (1971). This data set has one grouping factor (NOISE) and two trial factors 
(period and dial). The trial factors must be entered as dependent variables in a MODEL 
statement, so the variables are named P/D/, P1D2,..., P3D3. For example, P/D2 means a 
score in the {period1, dial2} cell. 


RIESBY- Reisby et al. (1977) studied the relationship between desipramine and imipramine levels 
in plasma in 66 depressed patients classified as either endogenous or nonendogenous. After 
receiving a placebo for one week, the researchers administered a dose of imipramine each day 
for four weeks, recording the imipramine and desipramine levels at the end of each week. At 
the beginning of the placebo week and at the end of each week (including the placebo week), 
patients received a score on the Hamilton depression rating scale. A diagnosis of endogenous 
or non-endogenous depression was made for each patient. Although the total number of 
subjects in this study was 66, the number of subjects with all measures at each of the weeks 
fluctuated: 61 at week 0 (start of placebo week), 63 at week 1 (end of placebo week), 65 at 
week 2 (end of “first drug treatment week), 65 at week 3 (end of second drug treatment week), 
63 at week 4 (end of third drug treatment week), and 58 at week 5 (end of fourth drug 
treatment week).The variables are- ID, HAMD, CONSTANT, WEEK, ENDOG, ENDOGWK. 


RLONGLEY” Longley (1967). The data were originally used to test the robustness of least- 
squares packages to multicollinearity and other sources of ill conditioning. The variables in 
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his data set are TOTAL, DEFLATOR, GNP, UNEMPLOY, ARMFORCE, POPULATN, and 
TIME. 


ROCKET» Components A, B, and C are mixed to form a rocket propellant. The elasticity of the 
propellant (ELASTIC) was the dependent variable. The other variable is RUN. 


ROHWER:¢ Timm (2002). The data set is based on the performance of 32 kindergartens in three 
standardized tests, peabody picture vocabulary test (PPV7), Raven progressive matrices test 
(RPMT) and a student achievement test (SAT).The independent variables are, named (N), still 
(S), named still (NS), named action (NA), sentence still (SS). 


ROTATES Metzler and Shepard (1974). These data measure reaction time in seconds (RT) versus 
angle of rotation in degrees (ANGLE) in a perception study. The experiment measured the 
time it took subjects to make “same” judgments when comparing a picture of a three 
dimensional object to a picture of possible rotations of the object. 

ROTHKOPF- Rothkopf (1957). These data are adapted from an experiment by Rothkopf in which 
598 subjects were asked to judge whether Morse code signals presented two in succession 
were the same. All possible ordered pairs were tested. For multidimensional scaling, the data 
for letter signals is averaged across sequence and the diagonal (pairs of the same signal) is 
omitted. The variables are A through Z. 


RYAN¢ Ryan (2002). Y7 and Y2 are the control variables and SAMPLE is the sample identifier. 


SALARY¢ These data compare the low and high salaries of executives in a particular firm. The 
variables are- SEX, EARNINGS, and COUNT. 

SCHOOLS" Neter, Kutner, Nachtsheim and Wasserman (1996). These data comprise a nested 
design where two teachers from each of three different schools are rated. SCHOOL indicates 
the school that the case describes. Each teacher variable (TEACHER(J—3)) represents a 
different school; a value of “1” indicates teacher 1 for that school, “2” indicates teacher 2 for 
that school, and “0” indicates that the teacher does not teach at that school. LEARNING 
measures the teacher’s effectiveness (the higher, the better). 

SCORES" Hand at al. (1996). The data set shows the results of 10 students sitting 14 examination 
papers for a degree in Statistics. Each result is a percentage. The variables are: 
TEST1....TEST14. 

SERUM” Crowder and Hand (1990). The data set consists of the antibiotic serum levels with two 
types of drugs applied to the same group of volunteers in two phases at different time points 
(TIME1, TIME2, TIME3, TIME6). 

SICKDATE- The data file lists the diagnosed date of each patient's illness (DIAGDATE) and the 
date each died (MORTDATE). These dates are listed in day-of-the-century format. 
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SIMUL and SIMUL2: These data contain three variables: Y, I, and J. Y is generated from 
M0, 1.57). 


SLEEPDM- Allison and Cicchetti (1976). This data set contains information from a study on the 
effects of physical and biological characteristics and sleep patterns influencing the danger of 
a mammal being eaten by predators. The study includes data on the hours of dreaming and 
non-dreaming sleep, gestation age, and body and brain weight for 62 mammals. The variables 


are- 

SPECIES$ Type of species 

BODY Body weight of the mammal in kg 

BRAIN Brain weight of the mammal in g 

SLO_SLP Number of hours of nondreaming sleep 
DREAM_SLP Number of hours of dreaming sleep 
TOTAL_SLEEP Number of hours of total sleep 

LIFE The life span in years 

GESTATE The gestation age 

PREDATION Index of predation as a quantitative variable 
EXPOSURE Index of exposure as a quantitative variable 
DANGER Danger index as a quantitative variable (based 


on the above two indices) 


SMOKE? Greenacre (1984). The data comprise a hypothetical smoking survey in a company. The 
variables are: STAFF, SMOKE, FREQ. 


SOCDES* Strahan and Gerbasi (1972). The 20-item version of the Social Desirability Scale was 
administered as embedded items in another test to 359 undergraduate students in psychology. 
The social desirability items were scored for the “social desirability” of the response and 
coded as 0’s and 1’s in this SYSTAT data set. 


SOFTWAREI: Musa ( 1979). The data set consists of failure times (TIME) (in CPU seconds, 
measured in terms of execution time) of a real-time command and control software system. 
The variable INTER contains inter-failure times. 


SOIL* Zinke and Stangenberger. These data were taken from a compilation of worldwide carbon 
and nitrogen soil levels for more than 3500 scattered sites. The full data set is available at the 
U.S. Carbon Dioxide Information Analysis Center (CDIAC) site on the World Wide Web. 
The subset included in SYSTAT pertains to the continental U.S. Duplicate measurements at 
single sites are averaged. 


LAT Sample site latitude 
LON Sample site longitude 
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STATISTCS$ Mean 

CARBON Carbon content in kg/m? 
NITRO Nitrogen content in kg/m? 
ELEV Sample site elevation in meters 


SPECTRO” Lindberg et al. (1983).The data set was used to fit a spectrographic model to help 
determine the amounts of three compounds present in samples from the Baltic Sea: Lignin 
Sulfonate: pulp industry pollution (LS), Humic Acids: natural forest products (HA), and 
optical whitener from detergent (DT). The data set consists of 16 samples of known 
concentrations of LS, HA and DT, with spectra based on 27 frequencies. (or equivalently, 
wavelengths) 


SPECTROMETERS. Two mass spedrometer (SPECTROMTR$) were compared for accuracy in 
measuring the ratio of 14N to ÌSN. Three plots of land (PLOT) treated with ISN were used and 
from every plot two soil samples (SAMPLE) were taken. Each sample had two observations. 
The response variable RATIO is the ratio of '4N to '5N multiplied by 1000. 


RATIO Ratio of two soil measurements. 
SPECTROMTR$ ID of a spectrometer (A, B). 
PLOT Plot number. 

SAMPLE Sample number 


SPIRAL* These data consist of a spiral in three dimensions with the variables X, Y, Z, R, and 
THETA. 

SPLINE” Brodlie (1980). These data are X and Y coordinates taken from a figure in Brodlie’s 
discussion of cubic spline interpolation. 


SPNDMONY¢ Chatterjee, Hadi and Price (2000). In this data set, SPENDING is consumer 
expenditures, and MONEY is money stock in billions of dollars in each quarter of the years 


1952-1956 (DATE). 
SUBWORLD> The data in the file SUBWORLD are a subset of cases and variables from the 
OURWORLD file. 


SUBWRLD2 The dataset is a transformation of SUBWORLD data set. The variables are 
standardized and sorted in descending GDP_CAP order and transformed them to log base 10 
units to symmetrize the distributions before they are standardized. only cases, with values for 


all the variables have been included. 
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SUB_OURWORLD:< It’s a subset of data set OURWORLD in SYSTAT. The variables are: 


CTEDUC Expenditure (in US dollars) per person for education in the city 
CTHEALTH Expenditure (in US dollars) per person for health in the city 
RUEDUC Expenditure (in US dollars) per person for education in rural area 
RUHEALTH Expenditure (in US dollars) per person for health in rural area” 


SUNSPTDM¢e Andrews and Herzberg (1985). The data set consists of a calculated relative 
measure of the daily number of sunspots compiled from the observations of a number of 


different observatories. 


YEAR The year the observations 
JAN-DEC The relative measure of sunspots for the indicated month 
ANNUAL The mean relative measure of sunspots for the entire year 


SURVEY2s In Los Angeles (circa 1980), interviewers from the Institute for Social Science 
Research at UCLA surveyed a multiethnic sample of 256 community members for an 
epidemiological study of depression and help-seeking behavior among adults (Afifi and 
Clark, 2004). The CESD depression index was used to measure depression. The index is 
constructed by asking people to respond to 20 items: “I felt I could not shake off the blues...,” 
“My sleep was restless,” and so on. For each item, respondents answered “less than 1 time per 
day” (score 0); “1 to 2 days per week” (score 1); “3 to 4 days per week” (score 2), or “5 to 7 
days” (score 3). Responses to the 20 items were summed to forma TOTAL score. Persons with 
a CESD TOTAL greater than or equal to 16 are classified as depressed. Variables include: 


ID Subject identification number 

SEX 1 = male; 2 = female 

AGE Age in years at last birthday 

MARITAL 1 = never married; 2 = married; 3 = divorced; 4 = separated; 5 = widowed 
1 = less than high school; 2 = some high school; 3 = finished high school; 

EDUCATN 4= some college; 5 = finished bachelor’s degree; 6 = finished master’s degree; 
7 = finished doctorate 

EMPLOY 1 = a a pan time; 3 = unemployed; 4 = retired; 5 = houseperson; 

INCOME Thousands of dollars per year 

SORT_INC Square root of income 

RELIGION 1 = Protestant; 2= Catholic; 3 = Jewish; 4 = none; 6 = other 

BLUE to DISLIKE Depression items 

TOTAL Total CESD score 

CASECONT 0=normal; 1 = depressed CESD > 16 
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DRINK 1 = yes, regularly; 2= no 
HEALTHY General health? 1 = excellent; 2= good; 3 = fair; 4 = poor 
CHRONIC Any chronic illnesses in last year? 0 = no; 1 = yes 


SURVEY3s Marascuilo and Levin (1983) and Cohen (1988). This is a fictitious data set consisting 


of responses of 640 men (COUN T) to the question "Does a woman have the right to decide 

whether an unwanted birth can be terminated during the first three months of pregnancy?” The 
response alternatives were cross-tabulated with religion. RELIGION$ and RESPONSE$ are 

represented by ordinal numbers in the data. 


SWEAT? Johnson and Wichern (2002). The data set consists of perspiration measurements from 


20 healthy females, on three variables, sweat rate (SWEAT_RATE), sodium content 
(SODIUM), and potassium content (POTASSIUM). 


SWETSDTA¢ Swets, Tanner, and Birdsall (1961) and reported by Swets and Pickett (1982). This 


example shows frequency data for two detectors in a study. Each of the subjects in the 
experiment used a six-category rating scale (RATING) to indicate his or her confidence that a 
signal was present on each of 597 trials when the signal was present, and on 591 randomly- 
mixed trials on which the signal was not present. The COUNT variable shows the number of 
times a subject gave a particular rating to a given signal state. The identifier SUBJ is a numeric 
variable in this case. 


SYMP* The dataset consists of 18 representative symptoms that have been taken and tallied for 


how many times they have occurred together in 50 diseases. The variables DIM/ and DIM2 
are the coordinates in two dimensions after performing the multidimentional scaling on the 
cooccurrences of symptoms for 50 diseases. The other variables LYME, MALARIA, YELLOW, 
RABIES and FLU (5 among the 50 diseases) are the dichotomous variables which indicate 
weather a particular symptom is present or not. 


TABLET? Netmaster Statistics Courses. An experiment was undertaken to compare two methods, 


HPLC and NIR, to ascertain the amount of active content in tablets. The tests have been 
applied to the same set of ten tablets, breaking each tablet into two halves, and applying one 
method to each half. The resulting data consists of the following variables - TABLET, HPLC 


and NIR. 


TABLET2: The data set is the indexed form of data set TABLET. 


TARGET: The data set is hypothetical. It describes the success of an arrow throwing machine to 


hit the target. The variables in the data set are: 


NOOFTRAILS Number of trails 
NOOFEVENTS Number of events 
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HEIGHT Height (cms) at which the machine is placed 
FORCE Force (newton) applied to hit the target 


TEACH? Mickey et al. (2004). The data set contains the two teaching methods and three teachers. 
Each teacher uses each teaching method with four different batches of students. The 
performance of each batch is measured by the average score of the batch in a common 
examination. The variables are - SCORE, TEACHER and METHOD. 


TEACHER" Timm (2002). The data set was obtained at the University of Pittsburgh by J. Raffaele 
to analyze the reading comprehension and reading rate of students. The teachers were nested 
within classes. The classes were noncontract and contract classes. The variables are- 


CLASSES$ Types of classes 
TEACHERS$ Teachers 
READRATE Reading rate 


READCOMPRE Reading comprehension 


TETRA” These data are from a bivariate normal distribution. Variables include X, Y and COUNT 
(frequency). 


THREAD> Taguchi et al. (1989). The data set consists of the tensile strength (STRENGTH), in 
kilograms per millimeter squared, of thread samples, collected every day for two months 
(MONTH) of production. 


TRIAL” These data contain six variables, X (1)... X(5), and SEX$. 


TVESP+ Hedeker and Gibbons (1996). The data set is from the Television School and Family 
Smoking Prevention and Cessation Project. Hedeker and Gibbons looked at the effects of two 
factors on tobacco use for students in 28 Los Angeles schools. One factor involved the use of 
a social-resistance curriculum or not. The other factor was the presence or absence of a 
television intervention, Crossing these two factors yields four experimental conditions, which 
were randomly assigned to the schools. Students were measured on tobacco and health 
knowledge both before and after the introduction of the two factors. 


TYPING” These data show the average speeds of typists in three groups, using typing speed 
(SPEED) and a character or numeric code for the machine used (EQUIPMNT3$). 
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USe State and Metropolitan Area Data Book (1986), Bureau of the Census; The World Almanac 


(1971). 
POPDEN 
PERSON 


PROPERTY 
INCOME 
SUMMER 
WINTER 
LABLAT 
LABLON 
RAIN 


People per square mile 
FBI-reported incidences, per 100,000 people, of personal crimes (murder, rape, rob- 
bery, assault) 


Incidences, per 100,000 people, of property crimes (burglary, larceny, auto theft) 
Per capita income 

Average summer temperature 

Average winter temperature 

Latitude in degrees at the center of each state 

Longitude at the center of each state 

Average inches of rainfall per year 


USCORR¢ The data set is a correlation matrix among 16 variables from the USSTATES data file. 
Following are the variable names: 


ACCIDENT CARDIO CANCER PULMONAR PNEW_FLU 
DIABETES LIVER VIOLRATE PROPRATE AVGPAY 
TEACHERS TCHRSAL MARRIAGE DIVORCE HOSPITAL 


DOCTOR 


USCOUNTe Taken from the US data. These data are the means of PERSON (personal crimes) and 
PROPERTY (property crimes) within REGION$. The COUNT variable shows the number of 


states over which the means were computed. 


USINCOME: These data are on the average income (INCOME) of a few regio 


ns. The variables 


are DIVISION$, COUNT, INCOME. 
USSTATES¢ State and Metropolitan Area Data Book (1986). The variables are - 


REGION and REGIONS Divide the country into four regions 
DIVISION and DIVISIONS Divide the country into nine regions 


LANDAREA 
POP85 
ACCIDENT 


CARDIO 
CANCER 
PULMONAR 


Land area in square miles, 1980 

1985 population in thousands 

Number of deaths by accident per 100,000 people 

Number of deaths from major cardiovascular disease per 100,000 
people 

Number of deaths from cancer per 100,000 people 

Number of deaths from chronic obstructive pulmonary disease 
per 100,000 people 
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PNEU_FLU 
DIABETES 
LIVER 


DOCTOR 
HOSPITAL 
MARRIAGE 
DIVORCE 
TEACHERS 
TCHRSAL 


HSGRAD 


AVGPAY 
TOTALSLE 
VIOLRATE 
PROPRATE 
PERSON 
POP90 

ID$ 
COUNT 


MSTROKE and FSTROKE 


INCOME89 
INCOME 


BUSH, PEROT, and CLINTON 
ELECVOTE 
PRES_88$ 


GOV_93$ 
GOV_928 
POVRTY91 
POVRTY90 
TORNADOS 


Number of deaths from pneumonia and influenza per 100,000 
people 
Number of deaths from diabetes mellitus per 100,000 people 


Number of deaths from chronic liver disease and cirrhosis per 
100,000 people 


Number of active, nonfederal physicians per 100,000 
Number of hospitals per 100,000 in 1988 

Number of marriages in thousands in 1989 

Number of divorces and annulments in thousands in 1989 
Number of teachers in thousands 

Average salary for teachers for the 1990 year 


Number of public high school graduates in the 1982-83 school 
year 


Average annual pay for a worker in 1989 

Total sale 

Violent crime rate per 100,000 people in 1989 

Rate of property crimes per 100,000 people in 1989 

Number of persons who commit crimes 

Population in thousands in 1990 as cited in the New York Times 
Name of each state in the United States 

Number associated with the state 


Risk of stroke per 100,000 males and females (adjusted to weight 
each state’s various age groups equally) 


Median household income in 1989 
Income in 1991 


Vote count in 1000 for each candidate in the 1992 presidential 
election 


Number of electoral votes each state received in the 1992 presi- 
dential election 


Number of electoral votes each state received in the 1988 presi- 
dential election 


Newly elected governor’s political party in each state after win- 
ning the 1993 gubernatorial races 


Winning political parties in the 1992 gubernatorial races 


Census Bureau’s estimate of the percentage of Americans living 
below the poverty level in 1991 


Poverty estimates for 1990 


Number of tornados per thousand square miles from 1953 to 
1991 


Wiican RNC in, 
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HIGHTEMP Average high temperature 
LOWTEMP Average low temperature 
RAIN Average annual rainfall 
SUMMER Average summer temperature 
WINTER Average winter temperature 
POPDEN Population density 

Longitude and latitude at the center of the state according to the 
LABLON, LABLOT World Almanac and Book of Facts (1992), Pharo Books, 

New York 
GOVSLRY Salaries for U.S. governors 


USVOTES¢* This data file breaks down the votes for CLINTON, BUSH, and PEROT by 
DIVISIONS. 


WESTWOOD» Neter, Kutner, Nachtsheim and Wasserman (1996). A spare part is manufactured 
by the Westwood Company once a month. The lot sizes manufactured vary from month to 
month because of differences in demand. These data show the number of man-hours of labor 
for each of 10 lot sizes manufactured. The variables are PROD_RUN, LOT_SIZE, and 
MAN_HRS. 


WILL. Williams (1986). RESPONSE is the dependent variable, LDOSE is the logarithm of the 
dose (stimulus), and COUNT is the number of subjects with that response. 


WILLIAMS" Cochran and Cox (1957). These data are from a crossover design for an experiment 
studying the effect of three different feed schedules (FEED) on milk production by cows 
(MILK). The design of the study has the form of two 3 x 3 Latin squares. PERIOD represents 
the period. RESIDUAL indicates the treatment of the preceding period. Other variables 
include number assigned to the cow (COW) and the Latin square number (SQUARE). 


WILLMSDM¢ Hubert (1984). This data set contains the results of a bioassay conducted to 
determine the concentration of nicotine sulfate required to kill 50% of a group of common 
fruit flies. The experimenters recorded the number of fruit flies that are killed at different 


dosage levels. The variables are- 


The dependent variable, which is the response of the fruit fly to the dose of 


RESPONSE nicotine sulfate (stimulus) 
LDOSE The logarithm of the dose 
COUNT The number of fruit flies with that response 


WINER: Winer (1971). The data are from a design with two trials (DAY( 1-2)), one covariate 
(AGE), and one grouping factor (SEX). 
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WORDS: Caroll, Davies, and Richmond (1971). The data set contains the most frequently used 
words (WORD$) in American English. Three measures have been added to the data. The first 
is the (most likely) part of speech (PART$). The second is the number of letters (LETTERS) 
in the word. The third is a measure of the meaning (MEANING). This admittedly informal 
measure represents the amount of harm done to comprehension (1 = a little, 4 = a lot) by 
omitting the word from a sentence. 


WORLD: Global mapping. The variables include MAPNUM, MAXLAT, MINLAT, MINLON, 
MAXLON, LABLAT, LABLON, and COLOR$. 


WORLD95M¢ For each of 109 countries, 22 variables were culled from several 1995 almanacs— 
including life expectancy, birth rate, the ratio of birth rate to death rate, infant mortality, gross 
domestic product per capita, female and male literacy rates, average calories consumed per 
day, and the percentage of the population living in cities. 


WORLDDM¢ Wilkinson, Blank, and Gruber (1996). This data set contains 1990 information on 
30 countries including birth and death rates, life expectancies (male and female), types of 
government, whether mostly urban or rural, and latitude and longitude. The variables are- 


COUNTRYS Country name 


BIRTH_RT Number of births per 1000 people in 1990 
DEATH_RT Number of deaths per 1000 people in 1990 ` 
MALE Years of life expectancy for males 

FEMALE Years of life expectancy for females 

GOV$ Type of government 

URBAN$ Rural or urban 

LAT Latitude of the country’s centroid 

LON Longitude of the country’s centroid 


YOUTH” Harman (1976). It is a correlation matrix, consisting of measurements recorded for 305 
females aged seven to seventeen: height, arm span, length of forearm, length of lower leg, 
weight, bitrochanteric diameter (the upper thigh), torso girth, and torso width. 
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Acronym & Abbreviation 


Expansions 


A 

ABS - absolute value 

ACF - autocorrelation function 

ACOLOR - color axes 

ACS - arccosine 

ACT - actuarial life table 

AD test - Anderson Darling test 

ADDTREE - additive trees 

ADFG - asymptotically distribution free estimate 
biased, Gramian 

ADFU - asymptotically distribution free estimate 
unbiased 

ADJSEASON - seasonal adjustment 

AHMAX - maximum extent 

AHMIN - minimum extent 

AIC - Akaike information criterion 

AID - automatic interaction detection 

ALT - alternative 

ANCOVA - analysis of covariance 

ANGI - deviation of angles from north in a 
clockwise direction 

ANG? - deviation of angles from horizontal (for 
3D models) 

ANG3 - tilt angle 

ANOVA - analysis of variance 

ANOVAHYPO - hypothesis tests in analysis of 
variance 

AR - autoregressive 

ARIMA - autoregressive integrated moving 
average 

ARL - average run length 
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ARMA - autoregressive moving average 
ARS - adaptive rejection sampling 
ASCII - American Standard Code for 
Information Interchange 

ASE - asymptotic standard error 

ASN - arcsine 

ATH - arc hyperbolic tangent 

ATN - arctangent 

AVERT - vertical extent 

AVG - average 


B 

BC - Bray-Curtis similarity measure 
BCa - Bias Corrected and accelerated 
BCF - Beta cumulative function 
BDF - Beta density function 
BETACORR - beta correction 

BIC - Bayesian information criterion 
BIF - Beta inverse function 

BMP - Windows bitmap 

BOF - beginning-of-file 

BOG - beginning-of-BY group 
BONF - Bonferroni 

BOOT - bootstrap 

BRN - Beta random number 


€ 

CART - classification and regression trees 
CBSTAT - column basic statistics 

CCF - Cauchy cumulative function 

CCF - cross-correlation function 

CDF - Cauchy density function 

cdf/CF - cumulative distribution function 
CDFUNC - coefficients for canonical variables 
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Acronyms 


CFUNC - coefficients for the classification 
functions 

CGM - Computer graphics metafile: binary or 
clear text 

CHAZ - cumulative hazard 

CHISQ - Chi-square distribution 

CHOL - Cholesky decomposition 

CI - confidence interval 

CIF - Cauchy inverse function 

CIM - confidence interval of mean 
CLASS - classification 

CLSTEM - stem and leaf plot for column 
CMeans - canonical scores of group means 
CMULTIVAR - multiple string variables 
COEF - coefficients 

COL/col - column 

COLPCT - Column percentages 
CONFIG - configuration 

CONT - Contingency coefficient 

CONV - convergence 

CORAN - correspondence analysis 
CORR - correlations 

CORRI! - single correlation coefficient 
CORR2 - equality of two correlations 
COV - covariance 

Cp - process capability index 

CPL - process capability based on lower 
specification limit 

CPU - process capability based on upper 
specification limit 

Cpk-Process capability index for off-centered 
process 

CR - confidence region 

CRA - cost of response above UTL 

CRB - cost of response below LTL 

CRN - Cauchy random number 
CSCORE - canonical scores 

CSIZE - size of characters 

CSQ - Chi-square 

CSTATISTICS - column statistics 

CSV - comma separated values 


CUSUM - cumulative sum 

CUSUM HI - Upper cumulative sum 
CUSUM LO - Lower cumulative sum 
CV - coefficient of variation 

CVI - cross validation index 


D 

DBF - Dbase files 

DC - deciles of risk 

DECF - Double exponential cumulative function 
DEDF - Double exponential density function 
DEIF - Double exponential inverse function 
DENFUN - density function 

dep. - dependent 

DERN - Double exponential random number 
DET - determinant 

DEVI - deviates (observed values - expected 
values) 

DEXP - Double exponential distribution 

df - degrees of freedom 

DF - distribution function 

DHAT - estimated distance 

DIF - data interchange format 

DIM - dimension 

DISCRIM - discriminant analysis 

DIST - distance 

DIT - dot histogram 

DOE - design of experiments 

DOS - disc operating system 

DPMO - defects per million opportunities 
DPU - defects per unit 

DTA - Stata files 

DUCE - Discrete uniform cumulative function 
DUDF - Discrete uniform density function 
DUIF - Discrete uniform inverse function 
DUNIFORM - Discrete uniform 

DURN - Discrete uniform random number 
DWLS - distance weighted least-squares 


E 
ECF - Exponential cumulative function 
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EDF - Exponential density function 
EEXP - extreme value exponential 
EIF - Exponential inverse function 
EIGEN - eigenvalues 

ELAMBDA - exp(lambda) 

EM - expectation-maximization 

EMF - Windows enhanced metafile 
ENCE - Logit normal cumulative function 
ENDF - Logit normal density function 
ENIF - Logit normal inverse function 
ENORMAL - Logit normal 

ENRN - Logit normal random number 
EOF - end-of-file 

EOG - end-of-BY group 

EPS - Encapsulated postscript 

ERN - Exponential random number 
ES - exhaustive search 

ESS - error sum of squares 

EW - extreme value Weibull 

EWMaA - exponentially weighted moving average 
EXP/exp - exponential/ expected 


F 

FAR - false-alarm rates 

FCF - F cumulative function 
FCOLOR - color foreground 

FDF - F density function 

FIF - F inverse function 

FINV - inverse of the F cumulative 
FITC - fitting distribution: continuous 
FITD - fitting distribution: discrete 
FITDIST - fitting distributions 
Flexibeta - flexible beta 

FPLOT - function plots 

FRN - F random number 

FTD - folded trellis detector 
FTDEV - Freeman-Tukey deviate 
FULLCOND - full conditional 
FUN - function 


G 


Acronyms 


GCF - Gamma cumulative function 
GCOR - groupwise correlation matrix 
GCOV - groupwise covariance matrix 
GCV - generalized cross validation 
GDF - Gamma density function 

GECF - Geometric cumulative function 
GEDF - Geometric density function 
GEIF - Geometric inverse function 
GEN - general Toeplitz structure 
GERN - Geometric random number 
GG - Greenhouse Geisser 

GIF - Gamma inverse function 

GIF - Graphics Interchange Format 
GLM - generalized linear models 
GLMHYPO - hypothesis tests in general linear 
model 

GLMPOST - post hoc estimate for repeated 
measures in general linear model 

GLS - generalized least-squares 

GMA - geometric moving average 

GN - Gauss-Newton method 

GOCF - Gompertz cumulative function 
GODF - Gompertz density function 
GOIF - Gompertz inverse function 
GORN - Gompertz random number 
GRN - Gamma random number 

GUCE - Gumbell cumulative function 
GUDF - Gumbell density function 
GUIF - Gumbell inverse function 
GURN - Gumbell random number 


H 

H & L- Hosmer and Lemeshow 

HC - heteroscedasticity-consistent 

HCE - Hypergeometric cumulative function 
HDF - Hypergeometric density function 
HF- Huynh-Feldt 

HGEOMETRIC - hypergeometric 

HIF - Hypergeometric inverse function 
HIST - histogram 

HKB - Hoerl, Kennard, and Baldwin 
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Acronyms 


H-L trace - Holding-Lawley trace 

HR - hit-rates 

HRN - Hypergeometric random number 
HSD - honestly significant differences 
HTERM - terms tested hierarchically 
HTML - hyper text markup language 
HYMH - hybrid Metropolis-Hastings 


I 

IF - Inverse cumulative distribution function 
IGAUSSIAN - inverse Gaussian 

IGCF - Inverse Gaussian cumulative function 
IGDF - Inverse Gaussian density function 
IGIF - Inverse Gaussian inverse function 
IGRN - Inverse Gaussian random number 
IIDMC - independently and identically 
distributed Monte Carlo 

IMPSAMPI - importance sampling integration 
IMPSAMPR - importance sampling ratio 
I-MR - individual and moving range 
Ind/indep - independent 

IndMH - Independent Metropolis-Hastings 
INDSCAL - individual differences scaling 
INITSAMP - initial sample 

INTEG FUN - integrated function 

IPA - iterated principal axis 

ITER - iterations 


J 

JACK - jackknife 

JCLASS - jackknifed classification 

JMP - JMP v3.2 data files 

JPEG/JPG - joint photographic experts group 


K 

K-M - Kaplan-Meier 

KNBD - kth nearest neighborhood 

KRON - Kronecker product 

K-S test - Kolmogorov-Smirnov test 

KS1 - one sample Kolmogorov-Smirnov tests 
KS2 - two sample Kolmogorov-Smirnov tests 


L 

LAD - least absolute deviations 

LB - larger the better 

LCF - Logistic cumulative function 
LCHAZ - log cumulative hazard 

LCL - lower control limit 

LCONV - log-likelihood convergence criteria 
LDF - Logistic density function 

LGM - log gamma 

LGST - logistic 

LIF - Logistic inverse function 

L-L/LL - log likelihood 

LMS- least median of squares 
LMSREG - least median of squares regression 
LNCF - Lognormal cumulative function 
LNDF - Lognormal density function 
LNIF - Lognormal inverse function 
LNOR/LNORMAL - lognormal 

LNRN - Lognormal random number 
loc - location 

LOGI - one-parameter logistic (Rasch) 
LOG2 - two-parameter logistic 

LOGIT - logistic regression 
LOGITHYPO - hypothesis tests in logistic 
regression 

LOGLIN - loglinear modeling 

LR - likelihood ratio 

LRCHI - likelihood ratio chi-square 
LRDEV - likelihood ratio of deviate 
LRN - Logistic random number 

LS - least-squares 

LSD - least significant difference 

LSL - lower specification limit 

LSQ - least-squares 

LTAB - life tables 

LTL - lower tolerance limit 

LW - Lawless and Wang 


M 
MA - moving average 


403 


MAD - mean absolute deviation 

MAHAL - Mahalanobis distances 

MANCOVA - multivariate analysis of covariance 
MANOVA - multivariate analysis of variance 
MANOVAHYPO - hypothesis tests in 
MANOVA 

MANOVAPOST - post hoc estimate for repeated 
measures in MANOVA 

MAR - missing at random 

MAX - maximum 

MAXSTEP - maximum number of steps 

MCAR - missing completely at random 

MCMC - Markov Chain Monte Carlo 
MDPREF - multidimensional preference 

MDS - multidimensional scaling 

MIN - minimum 

M-H- Metropolis-Hastings 

MIS - number of missing values 

MIX - mixed regression 

MIXHIER - mixed regression for data having a 
hierarchical structure 

MIXMULTY - mixed regression for data having 
a multivariate structure 

ML - Maximum Likelihood 

MLA - maximum likelihood analysis 

MLE - maximum likelihood estimate 

MML - maximum marginal likelihood 

MRC - Multiple Regression and Correlation 

MS - mean squares 

MSE - mean square error 

MSIGMA - sigma measurement 

MT - Mersenne-Twister 

MTW - MINITAB v11 data files 

MU2 - Guttman's mu2 monotonicity coefficients 
MULTIVAR - multiple variables 

MW - minimum within sum of squares deviations 
MWL - maximum Wishart likelihood 


N 
NAR - non-stationary first-order autoregressive 
NB - nominal the best 


Acronyms 


NBB - nominal-the-best: bilateral tolerance 
NBCF - Negative binomial cumulative function 
NBD - number of active bounds on parameter 
values 

NBDF - Negative binomial density function 
NBIF - Negative binomial inverse function 
NBINOMIAL - Negative binomial 

NBRN - Negative binomial random number 
NBU - nominal-the-best: unilateral tolerance 
NCAT - number of categories 

NCF - Binomial cumulative function 

NCOL - number of columns 

NDF - Binomial density function 

NDMAX - maximum number of points 
NDMIN - minimum number of points 

NEM - number of EM iterations 

NEXPO - negative exponential 

NIF - Binomial inverse function 

NIPALS - Nonlinear iterative partial least Squares 
NLAG - number of lags 

NLLOSS - nonlinear loss functions 
NLMODEL - nonlinear models 

NMIN - minimum count 

NMULTIVAR - multiple numeric variables 
NONLIN - nonlinear models 

NP-Number nonconforming 

NPAR - nonparametric 

NREC - non-recreationist 

NRN - Binomial random number 

NROW - number of rows 

NRP - number of apparently redundant 
parameters 

NSAMP - number of sub-samples 

NSPLIT - maximum number of splits 

NX - number of nodes along the x axis 
NXDIS - number of discretization points in the x 
(North) direction 

NY - number of nodes along the y axis 
NYDIS - number of discretization points in the y 
(East) direction 

NZ - number of nodes along the z axis 


404 


Acronyms 


NZDIS - number of discretization points in the z 
(Depth) direction 


oO 

Obs-observed 

OBSFREQ - observed frequency 

OC - operating characteristic 

ODBC - open database capture and connectivity 
OFREQ - outlier frequencies 

OLS - ordinary least-squares 
ORTHEQ-Equally Spaced Orthogonal 
component 

ORTHUN- Unequally Spaced Orthogonal 
component 


P 
P - Proportion nonconforming 
PACF - Pareto cumulative function 
PACF - partial autocorrelation function 
PADF - Pareto density function 
PAIF - Pareto inverse function 
PARAM - parameters 
PARN - Pareto random number 
PCA - process capability analysis 
PCF - iterated principal axis factoring 
PCF - Poisson cumulative function 
PCNTCHANGE - percentage change 
PCT - Macintosh PICT 
PDF - Poisson density function 
pdf - probability density function 
PDL - polynomial distributed lag 
PERMAP - perceptual mapping 
PIF - Poisson inverse function 
PLIMITS - probability limits 
PLS - partial least squres 
pmf -~ probability mass function 
PMIN - minimum proportion 
PNG - Portable Network Graphics 
POLY - polygon 
POSAC - partially ordered scalogram analysis 
with coordinates 


P-P - probability plot 

PP - process performance 

Ppk - Process performance index for off-centered 
process 

PPL - process performance based on lower 
specification limit 

PPM - parts per million 

PPU - process performance based on upper 
specification limit 

PRE - percentage reduction error 
PREFMAP - preference mapping 

PRN - Poisson random number 

PROB - probability 

PROP! - single proportion 

PROP? - equality of two proportions 

PS - PostScript 

PVAF/p.v.a.f. -- present value annuity factor 
p-value - probability value 


Q 

QC - quality control 

QMLE - quasi maximum likelihood estimate 
QNTL - quantiles 

QPLOT - quantile plots 

Q-QPLOT - two sample quantile plot 
QRD - QR decomposition 

QS - quick search 

QSK - quantitative symmetric similarity 
coefficients (or Kulczynski measure) 
QUASI - Quasi-Newton method 


R 

R & R- repeatability and reproducibility 

R chart - range chart 

RADMAX - maximum horizontal direction for 
the search radius 

RADMIN - minimum horizontal direction for the 
search radius 

RAND - random 

RANDSAMP - random sampling 

RANKREG - rank regression 
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RBSTAT - row basic statistics 

RCF - Rayleigh cumulative function 

RDF - Rayleigh density function 
RDISCRIM - robust discriminant 

RDIST - robust distance 

RDVER - vertical direction for the search radius 
REPAR - reparametrize 

REPS - replicates 

RESID - residuals 

RIF - Rayleigh inverse function 

RJS - rejection sampling 

RMS - root mean square 

RMSEA - root mean square error of 
approximation 

RMSSTD - root mean square standard deviation 
ROC - receiver operating characteristic 
ROWPCT - Row percentages 

RRN - Rayleigh random number 

RS - response surface 

RSE- robust standard errors 

RSEED - random seed 

RSM- response surface methods 

RSQ - stress and squared correlation 

RSS - residual sum of squares 
RSTATISTICS - row statistics 

RTF - rich text format 

RWM-H - random walk Metropolis-Hastings 
RWSTEM - stem and leaf plot for rows 


S 

S chart - standard deviation control chart 
SANGI - angle (in degrees) of the first minor axis 
of the search ellipsoid 

SANG2 - angle (in degrees) of the major axis of 
the search ellipsoid 

SANG3 - angle (in degrees) of the second minor 
axis of the search ellipsoid 

SAV - SPSS files 

SB - smaller the better 

sc - scale 

SC - set correlation 


Acronyms 


SCDFUNC - standardized coefficients for 
canonical variables 

SCF - Studentized cumulative function 
SD - standard deviations 

sd2/sas7bdat - SAS v9 files 

SDF - Studentized density function 
SE/se/S.E. - standard error 

SEK - standard error of kurtosis 

SEM - standard error of mean 

SES - standard error of skewness 

shp - shape 

SIF - Studentized inverse function 
SIMPLS - Straight-forward Implementation of 
Partial Least Squares 

SKMEAN - simple kriging mean 

SL - specification limit 

SMIN - minimum split value 

SPLOM - scatter plot matrix 

SQL - structured query language 
SQRT/SQR - square-root 

SRN - Studentized random number 
SRWR - sum of rank weighted residuals 
SS - sum of squares 

SSCP - sum of squares and cross products 
STA - Statistica v5 data files 

STAND - standardized deviates 

SVD - singular value decomposition 
SW - Shapiro-Wilks 

SYC/CMD - SYSTAT command Files 
SYZ/SYD/SYS - SYSTAT data files 
SYO - SYSTAT output files 


T 

T1 - one-sample t-test 

T2 - two-sample t-test 

TANALYZE - Taguchi design: analyze 
TCF - t cumulative function 

TCOR - total correlation 

TCOV - total covariance 

TDF -t density function 

TESTAT - Test Item Analysis 
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Acronyms 


TESTATCL - classical test item analysis 
TESTATLOG - logistic item response analysis 
TETRA - tetrachoric correlations 
TGENERATE - Taguchi design: generate 

TIF - t inverse function 

TIFF - Tagged Image File Format 

TLOG - log time 

TLOSS - Taguchi's Loss Function 

TNH - hyperbolic tangent 

TOHCO - Hypothesis Testing: Zero correlation 
TOHCI - Hypothesis Testing: Specific 
correlation 

TOHC2 - Hypothesis Testing: Equality of two 
correlation coefficients 

TOHPI - Hypothesis Testing: Single proportion 
TOHP2 - Hypothesis Testing: Equality of two 
proportions 

TOHTI - Hypothesis Testing: One sample t-test 
TOHT2 - Hypothesis Testing: Two sample t-test 
TOHTPAIRED - Hypothesis Testing: Paired t- 
test 

TOHV1 - Hypothesis Testing: Single variance 
TOHV2 - Hypothesis Testing: Two variances 
TOHVN - Hypothesis Testing: Several variances 
TOHZ1 - Hypothesis Testing: One sample z-test 
TOHZ2 - Hypothesis Testing: Two sample z-test 
TOL - tolerance 

TPLOT - time series plot 

TPREDICT - Taguchi design: predict 

TRCF - Triangular cumulative function 

TRDF - Triangular density function 

TRI - triangular 

TRIF - Triangular inverse function 

TRIM - trimmed mean 

TRN - t random number 

TRP - transpose 

TRRN - Triangular random number 
TSFOURIER - Fourier decomposition of time 
series 

TSIV - Two-Stage Instrumental Variables 
TSLS - Two-Stage Least Squares 


TSP - traveling salesman path 

TSQ chart - Hotelling's T? chart 
TSSMOOTH - smoothing time series 
TXT - text format 


U 

U chart - chart showing defects per unit 
UCF - Uniform cumulative function 
UCL - upper control limit 

UDF - Uniform density function 

UIF - Uniform inverse function 

UNCE - uncertainty coefficient 

URN - Uniform random number 

USL - upper specification limit 

UTL - upper tolerance limit 


v 
VAR - variance 
VIF - variance inflation factor 


W 

WB - Weibull 

WCF - Weibull cumulative function 
WCOR - pooled within-group correlation 
WCOV - pooled within-group covariance 
WDF - Weibull density function 
WHISKER - Box-and-Whisker plot 

WIF - Weibull inverse function 

WME - Windows metafile 

WRN - Weibull random number 


X 

XCF - Chi-square cumulative function 
XDF - Chi-square density function 

XIF - Chi-square inverse function 

XLAG - separation distance between lags 
XLS - excel format 

XLTOL - tolerance for lags 

XMAX - maximum along x axis 

XMIN - minimum along x axis 
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X-MR chart - Individuals and moving range chart 
XPT/TPT - SAS transport files 

XRN - Chi-square random number 

XTAB - Crosstabulations 


X 
YMAX - maximum along y axis 
YMIN - minimum along y axis 


Z 

Z1 - one-sample z-test 

Z2 - two-sample z-test 

ZCF - Normal cumulative function 
ZDF - Normal density function 
ZICF - Zipf cumulative function 
ZIDF - Zipf density function 
ZIF - Normal inverse function 
ZIIF - Zipf inverse function 
ZIRN - Zipf random number 
ZMAX - maximum along z axis 
ZMIN - minimum along z axis 
ZRN - Normal random number 


Acronyms 


&, 194 
@, 180 


A 


accelerator keys, 256 
access keys, 256, 259, 261 
active data file, 68 
active tab, 77 
add empty row, 74 
Add Examples, 187 
Advanced menu, 76 
align 

graphs, 74 

tables, 74 

text, 74 
Alt key, 81, 248, 259 


analysis of variance 
one-way, 125 
post hoc tests, 217 
two-way ANOVA, 132, 217 


Analyze menu, 76 
application gallery, 86, 283 
ASCII files, 74, 93 
Autocomplete, 272 
Autorecovery, 190 


B 

bar charts, 127, 134 
bitmaps, 74, 233 
BMP, 233 


Bonferroni adjusted probabilities, 113, 139 


boxplots, 124 
Bubble Help, 268 


buttons 
appearance, 255 
customization, 252 
Discussion, 85 
in Help system, 83 
Reset, 255 
shortcut keys, 256 
toolbars, 253, 255 
tooltips, 255 


C 


CAP, 247 
Case Selection, 246 
Invert, 253 


CGM, 74, 233 
CLASSIC, 276 
clipboard 


Index 


command submission from, 193 


cut selection, 256 
export results, 233 


submitting commands, 272 


cold commands, 176 
collapsible link, 67 
collapsing, 67 
expanding, 67 
command buffer, 272 
command files, 71 
comments, 188 
creating, 181, 193 
editing, 181, 193 
lists, 263 
opening, 185 
printing, 186 
saving, 183 


submitting, 152, 181, 186, 193 


Command folder, 85, 279 
command pane 
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Index 


Command pushbuttons, 79 
command shortcuts, 179 


@, 180 
ellipsis, 179 
command syntax, 175 
argument, 175 
module name, 175 
option, 175 
option value, 176 
command templates 
see templates 
commands, 173 
abbreviating, 176 
case sensitivity, 176 
clipboard submission, 193 
cold, 176 
comments, 188 
controlling output, 189 
creating command files, 181 
delimiters, 176 
DOS, 192 
editing, 181 
entering, 173 
files, 172, 181 
help, 180 
hot, 176 
interactive, 172, 173 
log, 172, 189 
long filenames, 176 
multiline commands, 176 
multiple transformations, 180 
quotation marks, 178 
recalling, 176 
running, 172 
spaces in filenames, 178 
submitting, 181, 186, 189, 193 
syntax, 175, 176 
tokens, 193 
Commandspace, 71, 102, 172 
batch, 72, 152, 172 
closing tabs, 78 
context menu, 78 
customization, 240, 241 
docking, 240 
fonts, 172 
hiding, 241 


interactive, 72 

interactive tab, 172, 173 

keyboard controls, 256 

log tab, 72, 172, 189 

moving, 240 

resizing, 241, 245 

shortcut keys, 256 

showing, 241 

undocking, 240 

untitled tab, 72, 172, 181 
comments 

!!, 188 

REM, 188 
computer graphics metafiles, 233 
context menu, 77, 190, 248, 252, 261 

batch tab, 190 

Commandspace, 78, 186 

data editor, 77 

Examples, 78 

Examples tab, 187 

Graph Editor, 77 

Log tab, 189 

output editor, 77 

Output Organizer, 77 

Startpage, 77 

toolbar area, 78 

variable editor, 77 
correlation, 113 
crosstabulation, 107 
CTRL key, 256 
Customize dialog, 73 

Commands tab, 248 

Keyboard tab, 260 

Toolbars tab, 254 
customizing menus and toolbars, 248 


D 


data, 279 
entering, 89 
data editor, 67, 74 
cell entry, 253 
context menu, 77 
first case, 253 
Invert Case Selection, 253 
last case, 253 
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next case, 253 
previous case, 253 
data files, 68 
active, 68 
viewing, 68 
Data folder, 279 
Data menu, 75 
Descriptive Statistics, 109 
dialog boxes, 79 
additional features, 81 
check boxes, 81 
command pushbuttons, 79 
command templates, 195 
edit texts, 81 
pushbuttons, 80 
radio buttons, 81 
right-click, 82 
selecting variables, 81 
source variable list, 80 
special lists, 80 
tabs, 79 
target variable list(s), 80 
directories 
file locations, 279 
DOS commands, 187, 192 
errors, 192 
graphs, 192 
mht, 193 
minimized, 193 
opening, 192 
output, 193 
quitting, 193 
saving, 193 
submitting, 192 
switches, 192 
drag and drop, 248, 249, 255 
Dynamic Explorer, 71 
dynamic explorer, 136 


E 


ECHO, 246 
echo commands, 276 


Edit menu, 74 
Data Editor, 74 


Index 


Find, 74 
Graph Editor, 74 
output editor, 74 
Output Organizer, 74 
Redo, 74 
Replace, 74 
Undo, 74 
EMF, 232 
encapsulated postscript files, 232 
entering data, 89 
EPS, 232, 233 
Examples, 71 
Examples tab, 78, 242 
Collapse All, 78 
context menu, 78 
customizing, 242 
Expand All, 78 
ini file, 244 
opening commnad files, 78 
run, 78 
Excel files, 74 
exponential distribution, 2 13 
exporting 
graphics, 233, 234 


F 


F10 key, 256 

F9 key, 176 

File menu, 74 
importing, 74 

file paths, 279 

filenames 
long names, 178 
spaces in, 178 
substituting for tokens, 198, 209 

fonts 

FORMAT, 280 

Format, 74 
Align, 74 
Bulleted List, 74 
Collapse Tree, 74 
insert page breaks, 74 
Numbered List, 74 
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Index 


Format Bar, 67, 253 
formatting toolbar 

see Format Bar, 253 
FPATH, 281 
frequency tables, 105 
Full screen Viewspace, 75 


G 
GIF, 74, 233 
global options, 270 
Glossary, 86 
GPRINT, 236 
GRAPH, 281 
graph 
panning, 76 
preview, 77 
realign frames, 75 
templates for graph options, 216 
viewing, 73 
zooming, 76 
graph editing 
Graph Editing toolbar, 253 
Graph Editing toolbar, 70 
Graph Editor, 69 
close, 77 
context menu, 77 
properties, 77 
Graph menu, 75 
annotation, 76 
Edit, 70 
Lasso, 76 
Overlay, 75 
Realign, 75 
Zoom, 70 
Graph Properties dialog, 77 
graph toolbar, 253 
graphs, 65 
animate, 71 
exporting, 233, 234 
printing, 236 
saving, 229, 232, 233 
grouping variables 
in scatterplots, 101 


GSAVE, 233 


H 


help, 82 
examples, 84 
navigating, 82 
online glossary, 86 

Help menu, 76 
Contents, 82 
Search, 83 

Help system, 82 
Contents, 82 
Favorites, 83 
Hide, 83 
Index, 82 
Refresh, 83 
toolbar, 83 

hot commands, 176 

HTML format, 74, 230 


I 
IMMEDIATE, 206 


insert, 74 
case, 74 
image, 74 
page break, 74 
insertion, 245 
integers 


substituting for tokens, 204, 211, 212, 213 


interactive tab 
recalling commands, 176 


J 


JMP files, 74 
JPEG files, 232, 233 
JPG, 233 


K 


keyboard shortcuts, 256, 260, 269 
Keyboard tab, 260 
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È 


landscape orientation, 236 
LDISPLAY, 280 
license, 76 


linear regression 
examples, 215 


listing data, 102 
Log tab, 72 
logistic distribution, 213 


M 
Macintosh PICT files, 232 
menu animation, 262 


menus, 73 
Advanced, 76 
Analyze, 76 
data, 75 
edit, 74 
file, 74 
graph, 75 
help, 76 
Quick Access, 76 
themes, 269 
utilities, 75 
view, 75 
Window, 76, 244 

metafiles, 232 

MHT, 74 

MINITAB files, 74 

modules, 174 

monospaced output, 275 


N 
normal distribution, 211, 212, 213 
NUM, 245 


numbers 
substituting for tokens, 204, 211, 212 


(0) 


one-way analysis of variance, 125 
orientation, 236 


Index 


output 
commands, 231 
directing to a file, 231 
directing to a printer, 231 
HTML format, 230 
organizing, 276 
printing graphs, 236 
rich text format, 230 
saving, 229, 230 
saving graphs, 232 
output editor, 67, 221 
alignment, 221 
collapsible link, 67 
context menu, 77 
customization, 244 
find text, 223 
graphs, 221, 222 
maximizing, 244 
preview, 77 
refresh, 77 
right-click editing, 224 
tables, 221 
view source, 77 
Output format, 274 
output options, 273 
Output Organizer, 71 
captions, 242 
closing files, 226 
closing folders, 225 
Collapse Tree, 74 
configuring, 227 
context menu, 77 
customizing, 242 
detailed node captions, 77 
dragging entries, 226 
Expand tree, 74 
hiding, 228, 244 
navigating output, 225 
opening folders, 226 
rename, 77 
reorganizing output, 225, 226 
resizing, 227 
set as active data file, 77 
transformations, 226 
tree folder, 227 
viewing, 227, 244 
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Index 


Output pane 


P 

PAGE, 280 

page setup, 236 

pairwise comparisons, 139, 219 
PCT, 233 

Pearson correlations, 113 
pixels, 251 

PLENGTH, 246 

PNG, 74, 233 

Portable Network Graphics, 233 
portrait orientation, 236 
PostScript files, 233 


predefined tokens, 207 
file paths, 208 

printing, 235, 236 
graphs, 236 

Processing Conditions, 69 


project directory, 279 
commom directory, 279 


PROMPT, 205 
proportional output, 274 
PS, 74, 233 


pushbuttons 
commands, 79 
dialog boxes, 80 


Q 


Quick Access menu, 76 
Quick Graphs, 74, 115, 276 


R 


random deviates, 212, 213 
recent dialogs, 266 
Record Script, 191, 268 
regression 

linear, 215 
REM, 188 
reorganizing 


user interface, 72 
Reset All buttons, 248 
Reset button, 255 
Rich Text Format, 230 


S 
SAS files, 74 
saving 
filename substitution, 198 
graphs, 229, 232, 233 
output, 229, 230 
results from statistical analyses, 231 
scatterplot matrices, 115 
scatterplots, 95 
3-D, 120 
grouping variables, 101 
shortcut keys, 256, 260 
smoothers, 97 
sorting cases, 102 
SPLOMs, 115 
S-PLUS files, 74 
SPSS files, 74 
Standard toolbar, 253 
starting SYSTAT, 88 
Startpage, 66 
customization, 245 
STATA files, 74 
Statistica files, 74 
statistics toolbar, 253 
status bar 
context menu, 247 
customization, 247 
hiding, 245 
viewing, 245 
stratification, 111 
strings 
substituting for tokens, 203, 209 
submit, 186 
clipboard, 187 
current line, 187 
from current line to end, 187 
from file list, 263 
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selection, 187 
window, 190 
Submit Window 
from Log tab, 190 
SYC, 193 
syntax 
see commands 
SYO, 230 
SYSTAT data files, 279 


T 


t test 
two-sample, 121 

Tab key, 81 

templates, 198 
automatic token substitution, 195, 213 
custom prompts, 205 
dialog sequences, 206 
examples, 209, 211, 212, 213, 215, 216, 217 
filename substitution, 198, 209 
IMMEDIATE option, 206 
integer substitution, 204, 211, 212, 213 
interactive substitution, 195 
messages, 197 
multiple instances of a token, 195 
number substitution, 204, 211, 212 
opening files, 198 
ordering tokens, 206 
PROMPT option, 205 
prompting for input, 195 
resetting tokens, 195 
saving files, 198 
string substitution, 203, 209, 213 
variable substitution, 200, 201, 209, 215 
viewing tokens, 207 

themes, 269 
applying, 269 
default, 270 
downloading, 270 
saving, 269 

3-D scatterplots, 120 


TIFF, 233 
TOKEN, 273 


Index 


tokens 
see templates 
toolbars, 254 
creating, 254 
default buttons, 253 
deleting, 254 
hiding, 254 
renaming, 255 
supplied with SYSTAT, 253 
tree folder, 227 
Tukey pairwise mean comparisons, 131 
two-sample t test, 121 
two-way analysis of variance, 132, 217 


U 


uniform distribution, 213 
unit of measurement, 179 
untitled tab, 172 


user interface 
Analyze, 76 
commandspace, 65 
data editor, 67 
Data menu, 75 
dynamic explorer, 71 
Edit menu, 74 
File menu, 74 
graph editor, 69 
Graph menu, 75 
help, 82 
Help menu, 76 
Output Organizer, 71 
View menu, 75 
Viewspace, 65 
workspace, 65 

User Menu, 187 

Utilities menu, 75, 253 
Examples, 75 
Macro, 75 
Recent Dailogs, 75 
Theme Menus, 66 
User Menu, 75 


V 


value labels 
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Index 
display, 246 hiding, 242 

Variable Editor, 77 Output Organizer, 71 
context menu, 78 resizing, 245 
processing conditions, 69 wrapping text, 275 


variable properties, 69 
variable labels 
display, 246 
Variable tab 
see Variable Editor, 68 
variables 
adding, 209, 213 
substituting for tokens, 200, 201, 209, 215 
VDISPLAY, 280 
view data, 68 
View menu, 75 
Commandspace, 75 
commandspace, 75 
processing conditions, 75 
Startpage, 75 
Workspace, 75 
workspace, 75 
Viewspace, 66 
data editor, 66, 67 
full screen, 75 
Graph Editor, 69 
maximizing, 244 
output editor, 66, 67 
tile, 244 


W 
Window, 187 
Window menu, 76 
arrange, 76 
Arrange Icons, 76 
Cascade, 76 
Tile, 73 
Tile Vertically, 73 
windows 
tiling, 73 
WMF, 232 
Workspace, 71 
customization, 242 
Dynamic Explorer, 71 
Examples tab, 71 


